Posts

[Linkpost] Building Altruistic and Moral AI Agent with Brain-inspired Affective Empathy Mechanisms 2024-11-04T10:15:35.550Z
Consciousness As Recursive Reflections 2024-10-05T20:00:53.053Z
Hyperpolation 2024-09-15T21:37:00.002Z
Rationalist Purity Test 2024-07-09T20:30:05.421Z
Bed Time Quests & Dinner Games for 3-5 year olds 2024-06-22T07:53:38.989Z
Towards Guaranteed Safe AI: A Framework for Ensuring Robust and Reliable AI Systems 2024-05-16T13:09:39.265Z
[Linkpost] Silver Bulletin: For most people, politics is about fitting in 2024-05-01T18:12:43.238Z
KAN: Kolmogorov-Arnold Networks 2024-05-01T16:50:58.124Z
Claude 3 Opus can operate as a Turing machine 2024-04-17T08:41:57.209Z
Leave No Context Behind - A Comment 2024-04-11T22:50:26.100Z
aintelope project update 2024-02-08T18:32:00.000Z
[Linkpost] Contra four-wheeled suitcases, sort of 2023-09-12T20:36:02.412Z
Trying AgentGPT, an AutoGPT variant 2023-04-13T10:13:41.316Z
What is good Cyber Security Advice? 2022-10-24T23:27:58.428Z
[Fun][Link] Alignment SMBC Comic 2022-09-09T21:38:54.400Z
Hamburg, Germany – ACX Meetups Everywhere 2022 2022-08-20T19:18:48.685Z
Brain-like AGI project "aintelope" 2022-08-14T16:33:39.571Z
Robin Hanson asks "Why Not Wait On AI Risk?" 2022-06-26T23:32:19.436Z
[Link] Childcare : what the science says 2022-06-24T21:45:23.406Z
[Link] Adversarially trained neural representations may already be as robust as corresponding biological neural representations 2022-06-24T20:51:27.924Z
What are all the AI Alignment and AI Safety Communication Hubs? 2022-06-15T16:16:03.241Z
Silly Online Rules 2022-06-08T20:40:41.076Z
LessWrong Astralcodex Ten Meetup June 2022 2022-05-29T22:43:09.431Z
[Linkpost] A conceptual framework for consciousness 2022-05-02T01:05:36.129Z
ACX Spring 2022 Meetup Hamburg 2022-04-25T21:44:11.141Z
Does non-access to outputs prevent recursive self-improvement? 2022-04-10T18:37:54.332Z
Unfinished Projects Thread 2022-04-02T17:12:52.539Z
[Quote] Why does i show up in Quantum Mechanics and other Beautiful Math Mysteries 2022-03-16T11:58:30.526Z
Estimating Brain-Equivalent Compute from Image Recognition Algorithms 2022-02-27T02:45:21.801Z
[Linkpost] TrojanNet: Embedding Hidden Trojan Horse Models in Neural Networks 2022-02-11T01:17:42.119Z
[Linkpost] [Fun] CDC To Send Pamphlet On Probabilistic Thinking 2022-01-14T21:44:57.313Z
[Linkpost] Being Normal by Brian Caplan 2021-11-27T22:19:18.051Z
[Linkpost] Paul Graham 101 2021-11-14T16:52:02.415Z
Successful Mentoring on Parenting, Arranged Through LessWrong 2021-10-21T08:27:57.794Z
Quote Quiz 2021-08-30T23:30:52.067Z
What do we know about vaccinating children? 2021-08-04T23:57:15.399Z
Calibrating Adequate Food Consumption 2021-03-27T00:00:56.953Z
Gunnar_Zarncke's Shortform 2021-01-02T02:51:36.511Z
Linkpost: Choice Explains Positivity and Confirmation Bias 2020-10-01T21:46:46.289Z
Slatestarcodex Meetup Hamburg 2019-11-17 2019-10-27T22:29:27.835Z
Welcome to SSC Hamburg [Edit With Your Details] 2019-09-24T21:35:10.473Z
Slatestarcodex Meetup in Hamburg, Germany 2019-09-09T21:42:25.576Z
Percent reduction of gun-related deaths by color of gun. 2019-08-06T20:28:56.134Z
Open Thread April 2018 2018-04-06T21:02:38.311Z
Intercellular competition and the inevitability of multicellular aging 2017-11-04T12:32:54.879Z
Polling Thread October 2017 2017-10-07T21:32:00.810Z
[Slashdot] We're Not Living in a Computer Simulation, New Research Shows 2017-10-03T10:10:07.587Z
Interpreting Deep Neural Networks using Cognitive Psychology (DeepMind) 2017-07-10T21:09:51.777Z
Using Machine Learning to Explore Neural Network Architecture (Google Research Blog) 2017-06-29T20:42:00.214Z
Does your machine mind? Ethics and potential bias in the law of algorithms 2017-06-28T22:08:26.279Z

Comments

Comment by Gunnar_Zarncke on Gunnar_Zarncke's Shortform · 2024-11-18T13:36:17.633Z · LW · GW

agents that have preferences about the state of the world in the distant future

What are these preferences? For biological agents, these preferences are grounded in some mechanism - what you call Steering System - that evaluates "desirable states" of the world in some more or less directly measurable way (grounded in perception via the senses) and derives a signal of how desirable the state is, which the brain is optimizing for. For ML models, the mechanism is somewhat different but there is also an input to the training algorithm that determines how "good" the output is. This signal is called reward and drives the system toward outputs that lead to states of high reward. But the path there depends on the specific optimization method and the algorithm has to navigate such a complex loss landscape that it can get stuck in areas of the search space that correspond to imperfect models for very long if not for ever. These imperfect models can be off in significant ways and that's why it may be useful to say that Reward is not the optimization target

The connection to Intuitive Self-Models is that even though the internal models of an LLM may be very different from human self-models, I think it is still quite plausible that LLMs and other models form models of the self. Such models are instrumentally convergent. Humans talk about the self. The LLM does things that matches these patterns. Maybe the underlying process in humans that give rise to this is different, but humans learning about this can't know the actual process either. And in the same way the approximate model the LLM forms is not maximizing the reward signal but can be quite far from it as long it is useful (in the sense of having higher reward than other such models/parameter combinations).   

I think of my toenail as “part of myself”, but I’m happy to clip it.

Sure, the (body of the) self can include parts that can be cut/destroyed without that "causing harm" but instead having an overall positive effect. The AI in a compute center would in analogy also consider decommissioning failed hardware. And when defining humanity, we do have to be careful what we mean when these "parts" could be humans. 

Comment by Gunnar_Zarncke on Self-Other Overlap: A Neglected Approach to AI Alignment · 2024-11-18T13:00:43.611Z · LW · GW

About conjoined twins and the self:

Krista and Tatiana Hogan (Wikipedia) are healthy functional conjoined craniopagus twins who are joined at the head and share parts of the brain - their thalamus is joined via a thalamic bridge: They can report on preceptions of the other and share affects.
I couldn't find scientific papers that studied their brain function rigorously, but the paper A Case of Shared Consciousness looks at evidence from documentaries and discusses it. Here are some observational details:
 

Each is capable of reporting on inputs presented to the other twin’s body. For example, while her own eyes are covered, Tatiana is able to report on visual inputs to both of Krista’s eyes. Meanwhile, Krista can report on inputs to one of Tatiana’s eyes. Krista is able to report and experience distaste towards food that Tatiana is eating (the reverse has not been reported, but may also be true). An often repeated anecdote is that while Tatiana enjoys ketchup on her food, Krista will try to prevent her eating it. Both twins can also detect when and where the other twin’s body is being touched, and their mother reports that they find this easier than visual stimuli.
fMRI imaging revealed that Tatiana’s brain ‘processes signals’ from her own right leg, both her arms, and Krista’s right arm (the arm on the side where they connect). Meanwhile Krista’s brain processes signals from her own left arm, both her own legs and Tatiana’s left leg (again on the side where they connect). Each twin is able to voluntarily move each of the limbs corresponding to these signals.
The twins are also capable of voluntary bodily control for all the limbs within their ordinary body plans. As their mother Felicia puts it, “they can choose when they want to do it, and when they don’t want to do it.”
The twins also demonstrate a common receptivity to pain. When one twin’s body is harmed, both twins cry.
The twins report that they talk to each other in their heads. This had previously been suspected by family members due to signs of apparent collusion without verbalisation.

Popular article How Conjoined Twins Are Making Scientists Question the Concept of Self contains many additional interesting bits:

when a pacifier was placed in one infant’s mouth, the other would stop crying.

About the self:

 Perhaps the experience of being a person locked inside a bag of skin and bone—with that single, definable self looking out through your eyes—is not natural or given, but merely the result of a changeable, mechanical arrangement in the brain. Perhaps the barriers of selfhood are arbitrary, bendable. This is what the Hogan twins’ experience suggests. Their conjoined lives hint at myriad permeations of the bodily self.

About qualia:

Tatiana senses the greeniness of Krista’s experience all the time. “I hate it!” she cries out, when Krista tastes some spinach dip.

(found via FB comment)

Comment by Gunnar_Zarncke on OpenAI Email Archives (from Musk v. Altman) · 2024-11-18T11:34:05.798Z · LW · GW

A much smaller subset was also published here, but does include documents:

https://www.techemails.com/p/elon-musk-and-openai?r=1jki4r 

Comment by Gunnar_Zarncke on Gunnar_Zarncke's Shortform · 2024-11-17T06:36:09.817Z · LW · GW

Instrumental power-seeking might be less dangerous if the self-model of the agent is large and includes individual humans, groups, or even all of humanity and if we can reliably shape it that way.

It is natural for humans to for form a self-model that is bounded by the body, though it is also common to be only the brain or the mind, and there are other self-models. See also Intuitive Self-Models.

It is not clear what the self-model of an LLM agent would be. It could be

  • the temporary state of the execution of the model (or models),
  • the persistently running model and its memory state,
  • the compute resources (CPU/GPU/RAM) allocated to run the model and its collection of support programs,
  • the physical compute resources in some compute center(s),
  • the compute center as an organizational structure that includes the staff to maintain and operate not only the machines but also the formal organization (after all, without that, the machines will eventually fail), or
  • dito but including all the utilities and suppliers to continue to operate it.

There is not as clear a physical boundary as in the human case. But even in the human case, esp. babies depend on caregivers to a large degree.

There are indications that we can shape the self-model of LLMs: Self-Other Overlap: A Neglected Approach to AI Alignment

Comment by Gunnar_Zarncke on Alexander Gietelink Oldenziel's Shortform · 2024-11-17T06:16:06.427Z · LW · GW

This sounds related to my complaint about the YUDKOWSKY + WOLFRAM ON AI RISK debate:

I wish there had been some effort to quantify @stephen_wolfram's "pockets or irreducibility" (section 1.2 & 4.2) because if we can prove that there aren't many or they are hard to find & exploit by ASI, then the risk might be lower.

I got this tweet wrong. I meant if pockets of irreducibility are common and non-pockets are rare and hard to find, then the risk from superhuman AI might be lower. I think Stephen Wolfram's intuition has merit but needs more analysis to be convicing.  

Comment by Gunnar_Zarncke on The Packaging and the Payload · 2024-11-13T13:04:43.355Z · LW · GW

There are two parts to the packaging that you have mentioned:

  • optimizing transport (not breaking the TV) is practical and involves everything but the receiver
  • enhancing reception (nice present wrapping) is cultural and involves the receiver(s)
Comment by Gunnar_Zarncke on You don't get to have cool flaws · 2024-11-09T06:51:03.610Z · LW · GW

Law of equal (or not so equal) opposite advice: The are some - probably few - flaws that you can keep because they are small and not worth the effort to fix or make you more lovable and unique.

Example:

  • I'm a very picky eater. No sauces, no creams, no spicy foods. Lots of things excluded. It limits what i can eat and i always have to explain.

But don't presume any flaw you are attached to falls into this category. I'm also not strongly convinced of this.

Comment by Gunnar_Zarncke on [Intuitive self-models] 8. Rooting Out Free Will Intuitions · 2024-11-05T13:43:46.246Z · LW · GW

a lot of the current human race spends a lot of time worrying - which I think probably has the same brainstorming dynamic and shares mechanisms with the positively oriented brainstorming. I don't know how to explain this; I think the avoidance of bad outcomes being a good outcome could do this work, but that's not how worrying feels - it feels like my thoughts are drawn toward potential bad outcomes even when I have no idea how to avoid them yet.

If we were not able to think about potentially bad outcomes well, that would a problem as clearly thinking about them is what avoids them, hopefully. But the question is a good one. My first intuition was that maybe the importance of an outcome - in both directions, good and bad - is relevant.

Comment by Gunnar_Zarncke on [Intuitive self-models] 8. Rooting Out Free Will Intuitions · 2024-11-05T13:38:22.197Z · LW · GW

I like the examples from 8.4.2:

  • Note the difference between saying (A) “the idea of going to the zoo is positive-valence, a.k.a. motivating”, versus (B) “I want to go to the zoo”. [...]
  • Note the difference between saying (A) “the idea of closing the window popped into awareness”, versus (B) “I had the idea to close the window”. Since (B) involves the homunculus as a cause of new thoughts, it’s forbidden in my framework. 

I think it could be an interesting mental practice to rephrase inner speech involving "I" in this way. I have been doing this for a while now. It started toward the end of my last meditation retreat when I switched to a non-CISM (or should I say "there was a switch in the thoughts about self-representation"?). Using "I" in mental verbalization felt like a syntax error and other phrasings like you are suggesting here, felt more natural. Interestingly, it still makes sense to use "I" in conversations to refer to me (the speaker). I think that is part of why the CISM is so natural: It uses the same element in internal and external verbalizations[1]

Pondering your examples, I think I would render them differently. Instead of: "I want to go to the zoo," it could be: "there is a desire to go to the zoo." Though I guess if "desire to" stands for "positive-valence thought about") it is very close to your "the idea of going to the zoo is positive-valence.

In practice, the thoughts would be smaller, more like "there is [a sound][2]," "there is a memory of [an animal]," "there is a memory of [an episode from a zoo visit]," "there is a desire to [experience zoo impressions]," "there is a thought of [planning]." The latter gets complicated. The thought of planning could be positive valence (because plans often lead to desirable outcomes) or the planning is instrumentally useful to get the zoo impressions (which themselves may be associated with desirable sights and smells), or the planning can be aversive (because effortful), but still not strong enough to displace the desirable zoo visit. 

For an experienced meditator, the fragments that can be noticed can be even smaller - or maybe more pre-cursor-like. This distinction is easier to see with a quiet mind, where, before a thought fully occupies attention, glimmers of thoughts may bubble up[3]. This is related to noticing that attention is shifting. The everyday version of that happens why you notice that you got distracted by something. The subtler form is noticing small shifts during your regular thinking (e.g., I just noticed my attention shifting to some itch, without that really interuping my writing flow). But I'm not sure how much of that is really a sense of attention vs. a retroactive interpretation of the thoughts. Maybe a more competent meditator can comment.

 

  1. ^

    And now I wonder whether the phonological loop, or whatever is responsible for language-like thoughts, maybe subvocalizations, is what makes the CISM the default model.

  2. ^

    [brackets indicate concepts that are described by words, not the words themselves]

  3. ^

    The question is though, what part notices the noticing. Some thought of [noticing something] must be sufficiently stable and active to do so.  

Comment by Gunnar_Zarncke on [Intuitive self-models] 2. Conscious Awareness · 2024-11-05T09:52:23.048Z · LW · GW

I think your explanation in section 8.5.2 resolves our disagreement nicely. You refer to S(X) thoughts that "spawn up" successive thoughts that eventually lead to X (I'd say X') actions shortly after (or much later). While I was referring to S(X) that cannot give rise to X immediately. I think the difference was that you are more lenient with what X can be, such that S(X) can be about an X that is happening much later, which wouldn't work in my model of thoughts.   

Explicit (self-reflective) desire

Statement: “I want to be inside.”

Intuitive model underlying that statement: There’s a frame (§2.2.3) “X wants Y” (§3.3.4). This frame is being invoked, with X as the homunculus, and Y as the concept of “inside” as a location / environment.

How I describe what’s happening using my framework: There’s a systematic pattern (in this particular context), call it P, where self-reflective thoughts concerning the inside, like “myself being inside” or “myself going inside”, tend to trigger positive valence. That positive valence is why such thoughts arise in the first place, and it’s also why those thoughts tend to lead to actual going-inside behavior.

In my framework, that’s really the whole story. There’s this pattern P. And we can talk about the upstream causes of P—something involving innate drives and learned heuristics in the brain. And we can likewise talk about the downstream effects of P—P tends to spawn behaviors like going inside, brainstorming how to get inside, etc. But “what’s really going on” (in the “territory” of my brain algorithm) is a story about the pattern P, not about the homunculus. The homunculus only arises secondarily, as the way that I perceive the pattern P (in the “map” of my intuitive self-model).

Comment by Gunnar_Zarncke on Could orcas be (trained to be) smarter than humans?  · 2024-11-05T05:52:38.718Z · LW · GW

As I commented on Are big brains for processing sensory input? I predict that the brain regions of a whale or Orca responsible for spatiotemporal learning and memory are a big part of their encephalization. 

Comment by Gunnar_Zarncke on Human Biodiversity (Part 4: Astral Codex Ten) · 2024-11-03T14:56:16.716Z · LW · GW

I'm not disagreeing with this assessment. The author has an agenda, but I don't think it's hidden in any way. It is mostly word thinking and social association. But that's how the opposition works!  

Comment by Gunnar_Zarncke on Testing "True" Language Understanding in LLMs: A Simple Proposal · 2024-11-03T13:01:49.236Z · LW · GW

I believe this has been done in Google's Multilingual Neural Machine Translation (GNMT) system that enables zero-shot translations (translating between language pairs without direct training examples). This system leverages shared representations across languages, allowing the model to infer translations for unseen language pairs.

Comment by Gunnar_Zarncke on Human Biodiversity (Part 4: Astral Codex Ten) · 2024-11-03T11:10:10.890Z · LW · GW

The above link posted is a lengthy and relatively well-sourced, if biased, post about Scott Alexander's writing related to human biodiversity (HBD). The author is very clearly opposed to HBD. I think it is a decent read if you want to understand that position. 

Comment by Gunnar_Zarncke on AI Safety Camp 10 · 2024-10-30T16:10:34.276Z · LW · GW

Thanks. I already got in touch with Masaharu Mizumoto.

Comment by Gunnar_Zarncke on [Intuitive self-models] 7. Hearing Voices, and Other Hallucinations · 2024-10-29T14:42:02.096Z · LW · GW

Congrats again for the sequence! It all fits together nicely.

While it makes sense to exclude hallucinogenic drugs and seizures, at least hallucinogenic drugs seem to fit into the pattern if I understand the effect correctly.

Auditory hallucinations, top-down processing and language perception - this paper says that imbalances in top-down cortical regulation is responsible for auditory hallucinations: 

Participants who reported AH in the week preceding the test had a higher false alarm rate in their auditory perception compared with those without such (recent) experiences. 

And this page Models of psychedelic drug action: modulation of cortical-subcortical circuits says that hallucinogenic drugs lead to such imbalances. So it is plausibly the same mechanism.

Comment by Gunnar_Zarncke on Gwern: Why So Few Matt Levines? · 2024-10-29T13:08:15.689Z · LW · GW
  • Scott Alexander for psychiatry and drugs and many other topics
  • Paul Graham for startups specifically, but his essays cover a much wider space
  • Scott Adams for persuasion, humor, and recently a lot of political commentary - not neutral; he has his own agendas
  • Robin Hanson - Economics, esp. long-term, very much out-of-distribution thinking
  • Zvi Mowshowitz for AI news (and some other research-heavy topics; previously COVID-19)

I second Patrick McKenzie.

Comment by Gunnar_Zarncke on Towards the Operationalization of Philosophy & Wisdom · 2024-10-29T08:34:26.566Z · LW · GW

I really like this one! I just with you had split it into two posts, one for Philosophy and one for Wisdom.

Comment by Gunnar_Zarncke on Towards the Operationalization of Philosophy & Wisdom · 2024-10-29T08:33:24.258Z · LW · GW

Finally someone gets Philosophy! Though admittedly, most of philosophy is not about philosophy these days. It is a tradition of knowledge that has lost much of its footing (see On the Loss and Preservation of Knowledge). But that's true of much of science and shouldn't lead us to ignore this core function of Philosophy: Study confusing questions in the absence of guiding structure.

The case of ethics as a field of Philosophy is interesting because it has been a part of it for so long. It suggests that people have tried and repeatedly failed to find a working ontology and make Ethics into its own paradigmatic field. I think this is so because ethics is genuinely difficult. Partly, because Intuitive Self-Models are so stable and useful but not veridical. But I think we will eventually succeed and be able to "Calculate the Morality Matrix."

Source: https://www.smbc-comics.com/comic/matrix 

Comment by Gunnar_Zarncke on AI Safety Camp 10 · 2024-10-27T06:29:10.650Z · LW · GW

Hi, is there a way to get people in touch with a project or project lead? For example, I'd like to get in touch with Masaharu Mizumoto because iVAIS sounds related to the aintelope project. 

Comment by Gunnar_Zarncke on The Case For Bullying · 2024-10-27T06:21:28.107Z · LW · GW

The post was likely downvoted because it conflicts with principles of empathy, cooperation, and intellectual rigor. Defending bullying, even provocatively, clashes with commonly held beliefs. The zero-sum framing of status is overly simplistic, ignoring positive-sum approaches. The provocative style comes off as antagonistic. Reframing the argument around prosocial accountability might get more positive responses.

Comment by Gunnar_Zarncke on [Intuitive self-models] 2. Conscious Awareness · 2024-10-25T20:16:01.660Z · LW · GW

Thanks. It doesn't help because we already agreed on these points. 

We both understand that there is physical process in the brain - neurons firing etc. - as you describe in 3.3.6, that gives rise to a) S(A), b) A, and c) the precursors to both as measured by Libet and others.

We both know that people's self-reports are unreliable and informed by their intuitive self-models. To illustrate that I understand 2.3 let me give an example: My son has figured out that people hear what they expect to hear and experimented with leaving out fragments of words or sentences, enjoying himself by how people never noticed anything was off (example: "ood morning"). Here, the missing part doesn't make it into people's awareness despite the whole sentence very well does.

I'm not asserting that there is nothing upstream of S(A) that is causing it. I'm asserting that an individual S(A) is not causing A. I'm asserting so because it can't timing-wise and equivalently, that there is no neurological action path from S(A) to A. The only relation between S(A) and A is that S(A) and A co-occurring has been statistically positive valence in the past. And this co-occurrence is facilitated by a common precursor. But saying S(A) is causing A is as right or wrong as saying A is causing S(A). 

Comment by Gunnar_Zarncke on [Intuitive self-models] 2. Conscious Awareness · 2024-10-25T14:52:36.079Z · LW · GW

I mean this (my summary of the Libet experiments and their replications):

  • Brain activity detectable with EEG (Readiness Potential) begins between 350 and multiple seconds (depending on experiment and measurement resolution) before the person consciously feels the intention to act (voluntary motor movement).
  • Subjects report becoming aware of their intention to act (via clock tracking) about 200 ms before the action itself (e.g., pressing a button). 200ms seems relatively fixed, but cognitive load can delay. 

To give a specific quote:

Matsuhashi and Hallet: Our result suggests that the perception of intention rises through multiple levels of awareness, starting just after the brain initiates movement.

[...]

1. The first detected event in most subjects was the onset of BP. They were not aware of the movement genesis at this time, even if they were alerted by tones. 
2. As the movement genesis progressed, the awareness state rose higher and after the T time, if the subjects were alerted, they could consciously access awareness of their movement genesis as intention. The late BP began within this period. 
3. The awareness state rose even higher as the process went on, and at the W time it reached the level of meta-awareness without being probed. In Libet et al’s clock task, subjects could memorize the clock position at this time. 
4. Shortly after that, the movement genesis reached its final point, after which the subjects could not veto the movement any more (P time).

[...]

We studied the immediate intention directly preceding the action. We think it best to understand movement genesis and intention as separate phenomena, both measurable. Movement genesis begins at a level beyond awareness and over time gradually becomes accessible to consciousness as the perception of intention.

Now, I think you'd say that what they measured wasn't S(A) but something else that is causally related, but then you are moving farther away from patterns we can observe in the brain. And your theory still has to explain the subclass of those S(A) that they did measure. The participants apparently thought these to be their decisions S(A) about their actions A. 

Comment by Gunnar_Zarncke on [Intuitive self-models] 2. Conscious Awareness · 2024-10-25T13:30:36.235Z · LW · GW

We’re definitely talking past each other somehow.

I guess this will only stop when we have made our thoughts clear enough for an implementation that allows us to inspect the system for S(A) and A. Which is OK.

At least this has helped clarify that you think of S(A) to (often) precede A by a lot, which wasn't clear to me. I think this complicates the analysis because of where to draw the line. Would it count if I imagine throwing the ball one day (S(A)) but executing it during the game the next day as I intend? 

What do you make of the Libet experiments?

Comment by Gunnar_Zarncke on [Intuitive self-models] 2. Conscious Awareness · 2024-10-25T10:34:02.875Z · LW · GW

There is just one problem that Libet discovered: There is no time for S(A) to cause A.

My favorite example is throwing a ball: A is the releasing of the ball at the right moment to hit a target. This requires Millisecond precision of release. The S(A) is precisely timed to coincide with the release. It feels like you are releasing the ball at the moment your hand releases it. But that can't be true because the signal from the brain alone takes longer than the duration of a thought. If your theory were right, you would feel the intention to release the ball and a moment later would have the sensation of the result happening. 

Now, one solution around this would be to time-tag thoughts and reorder them afterwords, maybe in memory - a bit like out-of-order execution in CPUs handles parallel execution of sequential instructions. But I'm not sure that is what is going on or that you think it is.  

So, my conclusion is that there is a common cause of both S(A) and A.

And my interpretation of Daniel Ingram's comments is different from yours.

In Mind and Body, the earliest insight stage, those who know what to look for and how to leverage this way of perceiving reality will take the opportunity to notice the intention to breathe that precedes the breath, the intention to move the foot that precedes the foot moving, the intention to think a thought that precedes the thinking of the thought, and even the intention to move attention that precedes attention moving.

These "intentions to think/do" that Ingraham refers to are not things untrained people can notice. There are things in the mind that precede the S(A) and A and cause them but people normally can't notice them and thus can't be S(A). I say these precursors are the same things picked up in the Libet experiments and neurological measurements.

Comment by Gunnar_Zarncke on Word Spaghetti · 2024-10-24T21:16:02.242Z · LW · GW

I think what you are looking for is this:

Scott Alexander:

People used to ask me for writing advice. And I, in all earnestness, would say “Just transcribe your thoughts onto paper exactly like they sound in your head.” It turns out that doesn’t work for other people. Maybe it doesn’t work for me either, and it just feels like it does.

and

I’ve written a few hundred to a few thousand words pretty much every day for the past ten years.

But as I’ve said before, this has taken exactly zero willpower. It’s more that I can’t stop even if I want to. Part of that is probably that when I write, I feel really good about having expressed exactly what it was I meant to say. Lots of people read it, they comment, they praise me, I feel good, I’m encouraged to keep writing, and it’s exactly the same virtuous cycle as my brother got from his piano practice.

(the included link is also directly relevant)

Comment by Gunnar_Zarncke on [Intuitive self-models] 2. Conscious Awareness · 2024-10-24T21:03:52.817Z · LW · GW

You give the example of the door that is sometimes pushed open, but let me give alternative analogies: 

  • S(A): Forecaster: "The stock price of XYZ will rise tomorrow." A: XYZ's stock rises the next day.
  • S(A): Drill sergeat, "There will be exercises at 14:00 hours." A: Military units start their exercises at the designated time.
  • S(A): Live commentator: "The rocket is leaving the launch pad." A: A rocket launches from the ground.

Clearly, there is a reason for the co-occurrence, but it is not one causing the other. And it is useful to have the forecaster because making the prediction salient helps improve predictions. Making the drill time salient improves punctuality or routine or something. Not sure what the benefit of the rocket launch commentary is.

Otherwise I think we agree.

Comment by Gunnar_Zarncke on [Intuitive self-models] 2. Conscious Awareness · 2024-10-23T14:42:56.453Z · LW · GW

I want to comment on the interpretation of S(A) as an "intention" to do A.

Note that I'm coming back here from section 6. Awakening / Enlightenment / PNSE, so if somebody hasn't read that, this might be unclear.

Using the terminology above, A here is "the patterns of motor control and attention control outputs that would does collectively make my muscles actually execute the standing-up action."

And S(A) is "the patterns of motor control and attention control outputs that would does collectively make my muscles actually execute the standing-up action are in my awareness." Meaning a representation of "awareness" is active together with the container-relationship and a representation of A. (I am still very unsure about how "awareness" is learned and represented.)[1]

Referring to 2.6.2, I agree with this:

[S(A) and A] are obviously strongly associated with each other. They can activate simultaneously. And even if they don’t, each tends to bring the other to mind, such that the valence of one influences the valence of the other.

and

For any action A where S(A) has positive valence, there’s often a two-step temporal sequence: [S(A) ; A actually happens]"

I agree that in this co-occurence sense "S(X) often summons a follow-on thought of X." But it is not causing it, what "summon" might imply. This choice of word is maybe an indication of the uncertainty here.

Clearly, action A can happen without S(A) being present. In fact, actions are often more effectively executed if you don't think too hard about them[citation needed]. An S(A) is not required. Maybe S(A) and A cooccur often, but that doesn't imply causality. But, indeed, it would seem to be causal in the context of a homunculus model of action. Treating it as causal/vitalistic is predictive. The real reason is the co-occurrence of the thoughts, which can have a common cause, such as when the S(A) thought brings up additional associations that lead to higher valence thoughts/actions later (e.g., chains of S(A), A, S(A)->S(B), B).

Thus, S(A) isn't really an "Intention to do A" per se but just as it says on the tin: "awareness of (expecting) A." I would say it is only an "intention to do A" if the thought S(A) also includes the concept of intention - which is a concept tied to the homunculus and an intuitive model of agency.

 

  1. ^

    I am still very unsure about how "awareness" is learned and represented. Above it says 

    the cortex, which has a limited computational capacity that gets deployed serially [...] When this aspect of the brain algorithm is itself incorporated into a generative model via predictive (a.k.a. self-supervised) learning, it winds up represented as an “awareness” concept,

    but this doesn't say how. The brain needs to observe something (sense, interoception) from which it can infer this. The pattern in what observations would that be? The serial processing is a property the brain can't observe unless there is some way to combine/compare past and present "thoughts." That's why I have long thought that there has to be a feedback from the current thought back as input signal (thoughts as observations). Such a connection is not present in the brain-like model, but it might not be the only way. Another way would be via memory. If a thought is remembered, then one way of implementing memory would be to provide a representation of the remembered thought as input. In any case, there must be a relation between successive thoughts, otherwise they couldn't influence each other. 

    It seems plausible that, in a sequence of events, awareness S(A) is a related to a pattern of A having occurred previously in the sequence (or being expected to occur).

Comment by Gunnar_Zarncke on [Intuitive self-models] 6. Awakening / Enlightenment / PNSE · 2024-10-23T10:49:41.726Z · LW · GW

After reading all the 2.6 and 3.3 sections again, I think the answer to why the homunculus is attention-grabbing is because it involves "continuous self-surprise" in the same way an animate object (mouse...) is. A surprise that is a present as a proprioceptive signal or felt sense. With PNSE, your brain has learned to predict the internal S(X) mental objects and this signal well enough that the remaining surprisingness of the mental processes would be more like the gears contraption from 3.3.2, where "the surprising feeling that I feel would be explained away by a different ingredient in my intuitive model of the situation—namely, my own unfamiliarity with all the gears inside the contraption and how they fit together." And as such, it is easier to tune out: The mind is doing its usual thing. Process as usual.  

Comment by Gunnar_Zarncke on Word Spaghetti · 2024-10-23T08:19:14.717Z · LW · GW

You are not alone. Paul Graham has been writing essays for a long time, and he is revising and rewriting a lot too. Here you can see him write one of his essays as an edit replay.

Also: "only one sentence in the final version is the same in the first draft."

Also:

The drafts of the essay I published today. This history is unusually messy. There's a gap while I went to California to meet the current YC batch. Then while I was there I heard the talk that made me write "Founder Mode." Plus I started over twice, most recently 4 days ago.

Bild
Comment by Gunnar_Zarncke on Sleeping on Stage · 2024-10-22T11:51:22.643Z · LW · GW

I think children can sleep in most places as long as they feel safe. Some parents seem to think that their children can only sleep in tightly controlled environments: Quiet, dark, comfy. But I think that is often a result of training. If the children never sleep in any other environments how can they feel suddenly safe there? Or if the parents or other people are stressed in the other environments, children will notice that something is off and not feel safe and not sleep. But a place with lots of friendly, happy people seems quite safe to me.

I found a photo of two of my kids sleeping "on stage." This table was right next to the stage at my sisters wedding and the music was not quiet for sure.

Comment by Gunnar_Zarncke on Against empathy-by-default · 2024-10-18T10:01:43.011Z · LW · GW

I do think that there are mechanisms in the human brain that make prosocial behavior more intrinsically rewarding, such as the mechanisms you pointed out in the Valence sequence. 

But I also notice that in the right kind of environments, "being nice to people" may predict "people being nice to you" (in a primary reward sense) to a higher degree than might be intuitive. 

I don't think that's enough because you still need to ensure that the environment is sufficiently likely to begin with, with mechanisms such as rewarding smiles, touch inclinations, infant care instincts or whatever. 

I think this story of how human empathy works may plausibly involve both social instincts as well as the self-interested indirect reward in very social environments.

Comment by Gunnar_Zarncke on Against empathy-by-default · 2024-10-17T20:53:15.992Z · LW · GW

I think the point we agree on is

habits that last through adulthood [because] the adult independently assesses those habits as being more appealing than alternatives,

I think that the habit of being nice to people is empathy.

So by the same token, when I was a little kid, yes it was in my self-interest (to some extent) for my parents to be healthy and happy. But that stopped being true as soon as I was financially independent. Why assume that people would permanently internalize that, when they fail to permanently internalize so many other aspects of childhood?

I'm not claiming that they "permanently internalize" but that they correctly (well, modulo mistakes) predict that it is their interests. You started driving a car because you correctly predicted that the situation/environment had changed. But across almost all environments, you get positive feedback from being nice to people and thus feel or predict positive valence about these.  

Actually it’s worse than that—adolescents are notorious for not feeling motivated by the well-being of their parents, even while such well-being is still in their own narrow self-interest!! :-P

That depends on the type of well-being and your ability to predict it. And maybe other priorities get in the way during that age. And again, I'm not claiming unconditional goodness. The environment of young adults is clearly different from that of children, but it is comparable enough to predict positive value from being nice to your parents. 

Actually, psychopaths prove this point: The anti-social behavior is "learned" in many cases during abusive childhood experiences, i.e., in environments where it was exactly not in their interest to be nice - because it didn't benefit them. And on the other side, psychopaths can, in many cases, function and show prosocial behaviors in stable environments with strong social feedback. 

This also generalizes to the cultures example.

As an example of (2), a religious person raised in a religious community might stay religious by default. Until, that is, they move to the big city

I agree: In the city, many of their previous predictions of which behaviors exactly lead to positive feedback ("quoting the Bible") might be off and they will quickly learn new behaviors. But being nice to people in general, will still work. In fact, I claim, it tends to generalize even more, which is why people who have been around more varied communities tend to develop more generalized morality (higher Kegan levels).

Comment by Gunnar_Zarncke on Against empathy-by-default · 2024-10-17T13:01:55.859Z · LW · GW

I think the steelmaned version of beren's argument is 

The potential for empathy is a natural consequence of learned reward models

That you indeed get for free. It will not get you far, as you have pointed out, because once you get more information, the model will learn to distinguish the cases precisely. And we know from observation that some mammals (specifically territorial ones) and most other animals do not show general empathy.

But there are multiple ways that empathy can be implemented with small additional circuitry. I think this is the part of beren's comment that you were referring to:

For instance, you could pass the RPE through to some other region to detect whether the empathy triggered for a friend or enemy and then return either positive or negative reward, so implementing either shared happiness or schadenfreude. Generally I think of this mechanism as a low level substrate on which you can build up a more complex repertoire of social emotions by doing reward shaping on these signals.

But it might even be possible that no additional circuitry is required if the environment is just right. Consider the case of a very social animal in an environment where individuals, esp. young ones, rarely can take care of themselves alone. In such an environment, there may be many situations where the well-being of others predicts your own well-being. For example, if you give something to the other (and that might just be smile) that makes it more likely to be fed. This doesn't seem to necessarily require any extra circuits, though it might be more likely to bootstrap off some prior mechanisms, e.g., grooming or infant care.

This might not be stable because free-loading might evolve, but this is then secondary.

I wonder which of these cases this comment of yours is:

consider “seeing someone get unexpectedly punched hard in the stomach”. That makes me cringe a bit, still, even as an adult.

Comment by Gunnar_Zarncke on Nathan Young's Shortform · 2024-10-14T19:39:55.230Z · LW · GW

Can you say more which concept you mean exactly?

Comment by Gunnar_Zarncke on Parental Writing Selection Bias · 2024-10-13T16:36:53.746Z · LW · GW

Jeff could offer to receive such stories anonymously and repost them.

Comment by Gunnar_Zarncke on [Intuitive self-models] 4. Trance · 2024-10-09T06:32:09.726Z · LW · GW

You refer to status as an attribute of a person, but now I'm wondering how the brain represents status. I wouldn't rule out the possibility of high status being the same thing as the willingness to let others control you. 

Comment by Gunnar_Zarncke on [Intuitive self-models] 4. Trance · 2024-10-09T06:28:27.292Z · LW · GW

You might want to have a look at the

The Collected Papers of Milton H. Erickson on Hypnosis Vol 1 - The Nature of Hypnosis and Suggestion

I read it some years ago and found it insightful and plausible and fun to read, but couldn't wrap my mind around it forming a coherent theory. And form my recollection, many things in there confirm Johnstone and complement it, esp. the high-status aspects. There may be more.   

Comment by Gunnar_Zarncke on [Intuitive self-models] 4. Trance · 2024-10-08T15:02:38.991Z · LW · GW

Crowds are trance-inducing because the anonymity imposed by the crowd absolves you of the need to maintain your identity.

In a tight crowd, it is easiest to do what the crowd is doing, and there are attractors for what works in a crowd (e.g. speed of movement) that the crowd's dynamic takes over.

Comment by Gunnar_Zarncke on Consciousness As Recursive Reflections · 2024-10-06T08:58:31.719Z · LW · GW

Does LessWrong need link posts for astralcodexten? 

Not in general, no. 

Aren't LessWrong readers already pretty aware of Scott's substack?

I would be surprised if the overlap is > 50%

I'm linkposting it because I think this fits into a larger pattern of understanding cognition that will play an important role in AI safety and AI ethics.

Comment by Gunnar_Zarncke on Hyperpolation · 2024-10-05T15:01:16.824Z · LW · GW

Hi Newbie, what are your thoughts on it?

Comment by Gunnar_Zarncke on "25 Lessons from 25 Years of Marriage" by honorary rationalist Ferrett Steinmetz · 2024-10-03T19:46:18.166Z · LW · GW

The advice in here might very well be of the "it seems obvious once you've read it" kind, but I think it's still useful

The problem is not that people don't know what to do. Just recently, I heard a similar difficulty of esports players: They know what to do - farm gold regularly, kill enemies, keep map awareness etc., whatever. It is just in the moment that the right action is elusive. 

"Why didn't you retreat when you were low on health?"

"I knew I was low on health and had to retreat! But I thought the way to retreat was left (where more trouble turned up) and not right."  

Feel free to take that as a metaphor for relationships if you want XD.

That's why I like the section about the Freakout Tree so much: It describes a common conflict pattern and provides a resolution approach worth imitating.

Explicitly asking “Hey, can I have the tree?” has saved our bacon more than once.

Comment by Gunnar_Zarncke on Alignment via prosocial brain algorithms · 2024-10-03T16:00:16.743Z · LW · GW

I have updated to it being a mix. It is not only being kept in check by others. There are benevolent rulers. Not all and nit reliable, but there seems to be potential.

Comment by Gunnar_Zarncke on Why is o1 so deceptive? · 2024-10-03T15:22:00.150Z · LW · GW

Convergence. Humans and LLMs with deliberation do the same thing and end up making the same class of errors

Comment by Gunnar_Zarncke on Gunnar_Zarncke's Shortform · 2024-10-02T23:19:54.373Z · LW · GW

Just came across Harmonic mentioned on the AWS Science Blog. Sequoia Capital interview with the founders of Harmonic (their system which generates Lean proofs is SOTA for MiniF2F):

Comment by Gunnar_Zarncke on LLMs are likely not conscious · 2024-09-29T21:56:37.882Z · LW · GW

I would remove that last paragraph. It doesn't add to your point and gives the impression that you might have a specific agenda.

Comment by Gunnar_Zarncke on Botworld: a cellular automaton for studying self-modifying agents embedded in their environment · 2024-09-28T21:49:22.210Z · LW · GW

Have there been any followups or forks etc. of Botworld since it was created? It seemed very promising. There should be something.

Comment by Gunnar_Zarncke on Why is o1 so deceptive? · 2024-09-27T22:22:21.002Z · LW · GW

I notice that o1's behavior (it's cognitive process) looks suspiciously like human behaviors:

  • Cognitive dissonance: o1 might fabricate or rationalize to maintain internal consistency of conflicting data (which means there is inconsistency).
  • Impression management/Self-serving bias: o1 may attempt to appear knowledgeable or competent, leading to overconfidence because it is rewarded for the look more than for the content (which means the model is stronger than the feedback).

But why is this happening more when o1 can reason more than previous models? Shouldn't that give it more ways to catch its own deception? 

No:

  1. Overconfidence in plausibility: With enhanced reasoning, o2 can generate more sophisticated explanations or justifications, even when incorrect. o2 "feels" more capable and thus might trust its own reasoning more, producing more confident errors ("feels" in the sense of expecting to be able to generate explanations that will be rewarded as good).
  2. Lack of ground-truth: Advanced reasoning doesn't guarantee access to verification mechanisms. o2 is rewarded for producing convincing responses, not necessarily for ensuring accuracy. Better reasoning can increase the capability to "rationalize" rather than self-correct.
  3. Complexity in mistakes: Higher reasoning allows more complex thought processes, potentially leading to mistakes that are harder to identify or self-correct. 

Most of this is analogous to how more intelligent people ("intellectuals") can generate elaborate, convincing—but incorrect—explanations that cannot be detected by less intelligent participants (who may still suspect something is off but can't prove it). 

Comment by Gunnar_Zarncke on Gunnar_Zarncke's Shortform · 2024-09-25T22:58:46.161Z · LW · GW

Look inside an LLM. Goodfire trained sparse autoencoders on Llama 3 8B and built a tool to work with edited versions of Llama by tuning features/concepts.

https://preview.goodfire.ai/

(I am loosely affiliated, another team at my current employer was involved in this) 

Comment by Gunnar_Zarncke on [Intuitive self-models] 1. Preliminaries · 2024-09-21T00:13:45.559Z · LW · GW

…So that’s all that’s needed. If any system has both a capacity for endogenous action (motor control, attention control, etc.), and a generic predictive learning algorithm, that algorithm will be automatically incentivized to develop generative models about itself (both its physical self and its algorithmic self), in addition to (and connected to) models about the outside world.

Yes, and there are many different classes of such models. Most of them boring because the prediction of the effect of the agent on the environment is limited (small effect or low data rate) or simple (linear-ish or more-is-better-like). 

But the self-models of social animals will quickly grow complex because the prediction of the action on the environment includes elements in the environment - other members of the species - that themselves predict the actions of other members.

You don't mention it, but I think Theory of Mind or Emphatic Inference play a large role in the specific flavor of human self-models.