Posts
Comments
Actually out of curiosity, why 4x? (And what exactly do you mean by "2x larger"?) (And is this for a naive algorithm which can be improved upon or a tight constraint?)
Thanks for pointing that out! I will tell my friends to make sure they actually get good data for the metabolic cost and not just use cortical neuron count as proxy if they cannot find something good.
(Or is there also another point you wanted to make?) And yeah it's actually also an argument for why orcas might be less intelligent (if they sorta use their neurons less often). Thanks.
My guess is that there probably aren't a lot of simple mutations which just increase intelligence without increasing cortical neuron count. (Though probably simple mutations can shift the balance between different sub-dimensions of intelligence as constrained through cortical neuron count.) (Also of course any particular species has a lot of deleterious mutations going around and getting rid of those may often just increase intelligence, but I'm talking about intelligence-increasing changes to the base genome.)
But there could be complex adaptations that are very important for abstract reasoning. Metacognition and language are the main ones that come to mind.
So even if the experiment my friends to will show that the number of cortical neurons is a strong indicator, it could still be that humans were just one of the rare cases which evolved a relevant complex adaptation. But it would be significant evidence for orcas being smarter.
An argument against orcas being more intelligent than humans runs thus: Orcas are much bigger than humans, so the fraction of the metabolic cost the brain consumes is smaller than in humans. Thus it took more selection pressure for humans to evolve having 21billion neurons than for orcas to have 43billion.[1] Thus humans might have other intelligence-increasing mutations that orcas didn't evolve yet.
So the question here is "how much does scale matter vs other adaptations". Luckily, we can get some evidence on that by looking at other species and rating how intelligent they are and correlating that with (1) number of cortical neurons and (2) fraction of metabolic cost the brain uses, to see how strong of an indicator each is for intelligence.
I have two friends who are looking into this for a few hours on the side (where one tries to find cortical neurons and metabolic cost data, and the other looks at animal behavior to rate intelligence (without knowing about neuron count or so)). It'll be rather a crappy estimate but hopefully we at least have some evidence from this in a week.
- ^
Of course metabolic cost doesn't necessarily need to be linear in the number of cortical neurons, but it'd be my default guess, and in any case I don't think it matters for gathering evidence across other species as long as we can directly get data on the fraction of the metabolic cost the brain uses (rather than estimating it through neuron count).
Another thought:
In what animals would I on priors expect intelligence to evolve?
- Animals which use collaborative hunting techniques.
- Large animals. (So the neurons make up a smaller share of the overall metabolic cost.)
- Animals that can use tools so they benefit more from higher intelligence.
- (perhaps some other stuff like cultural knowledge being useful, or having enough slack for intelligence increase from social dynamics being possible.)
AFAIK, orcas are the largest animals that use collaborative hunting techniques.[1] That plausibly puts them second behind humans for where I would expect intelligence to evolve. So it doesn't take that much evidence for me to be like "ok looks like orcas also fell into some kind of intelligence attractor".
- ^
Though I heard sperm whales might sometimes collaborate too, but not nearly that sophisticated I guess. But I also wouldn't be shocked if sperm whales are very smart. They have the biggest animal brains, but I don't whether the cortical neuron count is known.
Main pieces I remember were: Orcas already dominating the planet (like humans do), large sea creatures going extinct due to orcas (similar to how humans drove several species extinct, (Megalodon? Probably extinct for different reasons, weak evidence against? Most other large whales are still around)).
To clarify for other readers: I do not necessarily endorse this is what we would expect if orcas were smart.
(Also I read somewhere that apparently chimpanzees sometimes/rarely can experience menopause in captivity.)
If the species is already dominating the environment then the pressure from the first component compared to the second decreases.
I agree with this. However I don't think humans had nearly sufficient slack for most of history. I don't think they dominated the environment up until 20000years [1]ago or so, and I think most improvements in intelligence come from earlier.
That's why I'm attributing the level of human intelligence in large part to runaway sexual selection. Without it, as soon as interspecies competition became the most important for reproductive success, natural selection would not push for even grater intelligence in humans, even though it could improve our ability to dominate the environment even more.
I'm definitely not saying that group selection lead to intelligence in humans (only that group selection would've removed it over long timescales if it wasn't useful). However I think that there were (through basically all of human history) significant individual fitness benefits from being smarter that did not come from outwitting each other, e.g. being better able to master hunting techniques and thereby gaining higher status in the tribe.
- ^
Or could also be 100k years, idk
I'm not sure how it's relevant.
I thought if humans were vastly more intelligent than they needed to be they would already learn all the relevant knowledge quickly enough so they reach their peak in the 20s.
And if the trait, the runaway sexual selection is propagating, is itself helpful in competition with other species, which is obviously true for intelligence, there is just no reason for such straightening over a long timescale.
I mean for an expensive trait like intelligence I'd say the benefits need to at least almost be worth the costs, and then I feel like rather attributing the selection for intelligence to "because it was useful" rather than "because it was a runaway selection".
(For reference I think Tsvi and GeneSmith have much more relevant knowledge for evaluating the chance of superbabies being feasible and I updated my guess to like 78%.)
(As it happens I also became more optimistic about the orca plan (especially in terms of how much it would cost and how long it would take, but also a bit in how likely I think it is that orcas would actually study science) (see footnote 4 in post). For <=30y timelines I think the orca plan is a bit more promising, though overall the superbabies plan is more promising/important. I'm now seriously considering pivoting to the orca plan though.) (EDIT: tbc I'm considering pivoting from alignment research, not superbaby research.)
(haha cool. perhaps you could even PM Abram if he doesn't PM you. I think it would be pretty useful to speed up his agenda through this.)
Thanks!
I agree that sexual selection is a thing - that it's the reason for e.g. women sometimes having unnecessarily large breasts.
But I think it gets straightened out over long timescales - and faster the more expensive the trait is. And intelligence seems ridiculously expensive in terms of metabolic energy our brain uses (or childbirth motality).
A main piece that updated me was reading anecdotes in Scott Alexander's Book review of "The Secret of our success" where I now think that humans did need their intelligence for survival. (E.g. 30 year old hunter gatherers perform better at hunting etc than hunter gatherers in their early 20s, even though the latter are more physically fit.)
A few more thoughts:
It's plausible that for both humans and orcas the relevant selection pressure mostly came from social dynamics, and it's plausible that there were different environmental pressures.
Actually my guess would be that it's because intelligence was environmentally adaptive, because my intuitive guess would be that group selection[1] is significant enough over long timescales which would disincentivize intelligence if it's not already (almost) useful enough to warrant the metabolic cost, unless the species has a lot of slack.
So an important question is: How adaptive is high intelligence?
In general I would expect that selection pressure for intelligence was significantly stronger in humans, but maybe for orcas it was happening over a lot longer time window, so the result for orcas could still be more impressive.
From what I observed about orca behavior I'd perhaps say a lower bound of their intelligence might roughly be like human 15 year olds or so. So up to that level of intelligence there seem to be benefits that allow orcas to use more sophisticated hunting techniques.
But would it be useful for orcas to be significantly smarter than humans? My prior intuition would've been that probably not very much.
But I think observing the impressive orca brains mostly screens this off: I wouldn't have expected orcas to evolve to be that smart, and I similarly strongly wouldn't have expected them to have that impressive brains, and seeing their brains updates me that there had to be some selection pressure to produce that.
But the selection pressure for intelligence wouldn't have needed to be that strong compared to humans for making the added intelligence worth the metabolic cost, because orcas are large and their neurons make up a much smaller share of their overall metabolic consumption. (EDIT: Actually (during some (long?) period of orca history) selection pressure for intelligence also would've needed to be stronger than selection pressure for other traits (e.g. making muscles more efficient or whatever).)
And that there is selection pressure is not totally implausible in hindsight:
- Orcas hunt very collaboratively, and maybe there are added benefits from coordinating their attacks better. (Btw, orcas live in matrilines, and I'd guess that from an evolutionary perspective the key thing to look at is how well a matriline performs, not individuals, but not sure. So there would be high selection for within-matriline cooperation (and perhaps communication!).)
- Some/(many?) Orca sub-species prey on other smart animals like dolphins or whales, and maybe orcas needed to be significantly smarter to be able to outwit the defensive mechanisms they learn to adapt.
But overall I know way too little about orca hunting techniques to be able to evaluate those.
ADDED 2024-11-29:
To my current (not at all very confident) knowledge, orcas split of from other still alive dolphin species 5-10million years ago (so sorta similar to humans - maybe slightly longer for orcas). So selection pressure must've been relatively strong I guess.
Btw, bottlenose dolphins (which have iirc 12.5 billion cortical neurons) are to orcas sorta like chimps are to humans. One could look how smart bottlenose dolphins are compared to chimps.
(There are other dolphin species (like pilot whales) which are probably smarter than bottlenose dolphins, but those aren't studied more than orcas, whereas bottlenose dolphins are.)
- ^
I mean group selection that could potentially be on a level of species where species go extinct. Please lmk if that's actually called differently.
thanks. Can you say more about why?
I mean runaway sexual selection is basically H1, which I updated to being less plausible. See my answer here. (You could comment there why you think my update might be wrong or so.)
My prior intuitive guess would be that H1 seems quite a decent chunk more likely than H2 or H3.
Actually I changed my mind.
Why I thought this before: H1 seems like a potential runaway-process and is clearly about individual selection which has stronger effects than group selection (and it was mentioned in HPMoR).
Why I don't think this anymore:
- It would also be incredibly huge coincidence if intelligence mostly evolved because of social dynamics but happened to be useful for all sorts of other survival techniques hunters and gatherers use. See e.g. Scott Alexander's Book review of "The Secret of our success".
- If there was only individual benefits for intelligence but it was not very useful otherwise then over long timelines group selection[1] would actually select against smarter humans because their neurons would use up more metabolic energy.
However, there's a possibly very big piece of evidence for H3: Humans are both the smartest land animals and have the best interface for using tools, and that would seem like a suspicious coincidence.
I think this is not a coincidence but rather that tool use let humans fall into an attractor basin where payoffs of intelligence were more significant.
- ^
I mean group selection that could potentially be on a level of species where species go extinct. Please lmk if that's actually called differently.
(Major edits added on 2024-11-29.)
Some of my own observations and considerations:
Anecdotal evidence for orca intelligence
(The first three anecdotes were added 2024-11-29.)
- Orcas leading orca researcher on boat 15miles home through the fog. (See the 80s clip starting from 8:10 in this youtube video.)
- Orcas can use bait.
- An orca family hunting a seal can pretend to give up and retreat and when the seal comes out thinking it's safe then BAM one orca stayed behind to catch it. (Told by Lance Barrett-Lennard somewhere in this documentary.[1])
- Intimate cooperation between native australian hunter gatherers and orcas for whale hunting: https://en.wikipedia.org/wiki/Killer_whales_of_Eden,_New_South_Wales
- Orcas being skillful at turning boats around and even sinking a few vessels[2][3]: https://en.wikipedia.org/wiki/Iberian_orca_attacks
- Orcas have a wide variety of cool hunting strategies. (e.g. see videos (1, 2)). I don't know how this compares to human hunter gatherers. (EDIT: Ok I just read Scott Alexander's Book review of "The Secret of our success" and some anecdotes on hunter gatherers there seem much more impressive. (But also plausible to me that other orca hunting techniques are also more sophisticated than the examples but in ways it might not be legible to us.))
(ADDED 2024-11-10[4]: Tbc, while this is more advanced than I'd a priory expected from animals, the absence of observations of even more clearly stunning techniques is some counterevidence of orcas being smarter than humans. Though I also don't quite point to an example of what I'd expect to see if orcas were actually 250 IQ but what I don't observe, but I also didn't think for long and maybe there would be sth.)
(Mild counterevidence added 2024-12-02:)
- Btw it's worth noting that orcas do sometimes get tangled up in fishing gear or strand (and die of that), though apparently less frequently than other cetaceans, though didn't check precisely whether it's really less per individual.
- Worth noting that there are only 50000-100000 orcas in the world, which is less than for many other cetacean species, though not sure whether it's less in terms of biomass.
Orca language
(EDIT: Perhaps just skip this orca language section. Relevant is that orca language is definitely learned and not innate. Otherwise not much is known, except that we can eyeball the complexity of their calls. You could take a look by listening[5] here.[6] I'd say it seems very slightly less complex than in humans (though could be more) and much more complex than what is observed in other land animals.)
(Warning: Low confidence. What I say might be wrong.)
I didn't look deep into research into orca language (not much more than watching this documentary), my impression is that we don't know much yet.
Some observations:
- Orcas language seems to be learned, not innate. Different regions have different languages and dialects. Scientists seem to analogize it to how humans speak different languages in different countries.
- For some orca groups that were studied, scientists were able to cluster their calls into 23 or 24 different calls clusters, but still with significant variation of calls within a call cluster.
- (I do not know how tightly calls are clustered, or whether there often are outliers.)
- Orcas communicate a lot. (This might be wrong but I think they spend a significant fraction of their time socializing where they exchange multiple calls per minute.)
- (Orcas emit clicks and whistles. The clicks are believed to be for spacial navigation (especially in the dark), the whistles for communication.) (EDIT: Actually also pulsed calls, which I initially lumped in with whistles but are emitted in pulses. Those are probably the main medium of communication.)
I'd count (2) as some weakish evidence against orcas having as sophisticated language as humans, however not very strongly. Some considerations:
- Sentences don't necessarily need to be formed through having temporal sequences of words, but words could also be some different frequency signals or so which are then simultanously overlayed.
- (The different 24 call types could be all sorts of things. E.g. conveying what we convey through body language, facial expressions, and tone. Or e.g. different sentence structures. Idk.)
- Their language might be very alien. I only have shitty considerations here but e.g.:
- Orca language doesn't need to have at all similar grammar. E.g. could be something as far from our language as logic programming is, though in the end still not nearly that simple.
- Orcas might often describe situations in ways we wouldn't describe them. E.g. rather about what movements they and their prey executed or sth.
- Orcas might describe more precisely where in 3D water particular orcas and animals were located, and they might have a much more efficient encoding for that than if we tried to communicate this.
More considerations
The onlymain piece of evidence that makes me wonder whether orcas might actually be significantly smarter than humans is their extremely impressive brain. I think it's pretty strong though.
As mentioned, orcas have 2.05 times as many neurons in their neocortex as humans, and when I look through the wikipedia list (where I just trust measured[7] and not estimated values), it seems to be a decent proxy for how intelligent a species is.
There needs to be some selection pressure for why they have 160 times more neurons in their neocortex than e.g. brown bears (which weigh like 1/8th of an orca or so). Size alone is not nearly a sufficient explanation.
It's plausible that for both humans and orcas the relevant selection pressure mostly came from social dynamics, and it's plausible that there were different environmental pressures. (I'm keen to learn.) It's possible that caused humans to be smart more strongly incentivized our brains to be able to do abstract reasoning, whereas for orcas it might've been useful for some particular skills that generalize less well for doing other stuff.
If I'd only ever seen hunter gatherer humans, even if I could understand their language, I'm not sure I'd expect that species to be able to do science on priors. But humans are able to do it. Somehow our intelligence generalized far outside the distribution we were optimized on. I don't think that doing science is similar to anything we've been optimized on, except that advanced language might be necessary.
On priors I wouldn't really see significant reasons why whatever selection pressures optimized orcas to have their astounding brains, would make their intelligence generalize less well to doing science, than whatever selection pressures produced our impressive human brains.
One thing that would update me significantly downwards on orcas being able to do science is if their prefrontal cortex doesn't contain that many neurons. (I didn't find that information quickly so please lmk if you find it.) Humans have a very large prefrontal cortex compared to other animals. My guess would be that orcas have too, and that they probably still have >1.5 times as many neurons in their prefrontal cortex than humans, and TBH I even wouldn't be totally shocked if it's >2.5 times. (EDIT: The cortex of the cetacean brain is organized differently than in most mammals and AFAIK we currently cannot map functionality very well.)
(Read my comments below to see more thoughts.)
- ^
You might need a VPN to canada to watch it.
- ^
Btw there is no recorded case of a human having been killed by an orca in the wild, including when they needed to swim when the vessel was sunk. (Even though orcas often eat other mammals.) (I think I even once heard it mention that it seemed like the orcas made sure that no humans died from their attacks, though I don't at all know how active the role of the orcas was there (my guess is not very).)
- ^
I'd consider it plausible that they were trying to signal us to please stop fishing that much, but I didn't look nearly deeply enough into it to judge.
- ^
Actually I don't remember exactly when I added this. I still think it's true but to a weaker extent than I originally thought.
- ^
Or downloading the files and looking at the spectrogram in e.g. audacity.
- ^
If you want to take a deeper look, here are more recordings.
- ^
Aka optical or isotropic fractionator in the method column.
Thanks!
What's the ect? Or do you have links for where to learn more? (What's the name of the field?)
(I thought wikipedia would give me a good overview but your list was already more useful to me.)
Thanks. No I didn't. (And probably don't have time to look into it but still nice to know.)
Justification for this:
I don't think organisms end up with 40 billion cortical neurons without either some strong selection for at least some sub-dimensions of intelligence, or being as big as Godzilla.
One could naively expect that the neuron count (especially touch and motor) sensory processing modules are proportional to the surface area of an organism. However I think this is unrealistic: Bears don't need nearly as fine precision on what square centimeter of skin was touched (or what millimeter the paw moves) than mice, and generally this is because precision gets less relevant given body size.
So let's say the precision an organism needs is proportional to the square root of the 1-dimensional-size (aka sqrt(surface_area)) of the organism. Aka if a mice is 5cm tall and a bear 2m, the spacing between sensors on the mouse skin vs on the bear skin would be sqrt(0.05) vs sqrt(2). The number of sensors on the skin surface is proportional to the square of the distancing between sensors, so the overall number of sensors is proportional to the 1-dimensional-size (aka sqrt(surface_area)).
A brown bear has 250million neorons in the neocortex and is maybe 2m tall. So to get just by scaling size to 40billion neorons an organism would have to be 40/0.25 * 2m = 320m tall. So actually bigger than godzilla.
I don't think I'm the right person to look into this.
I just updated quickly via conservation of expected probability. (I agree though that I'd be a bit concerned about most people updating that quickly. If you think I've gone slightly psychotic please bet with me so I update harder if I notice you're right.)
(EDIT: actually it's sorta shitty because we might not get more evidence because I have even more important things to do and probably don't have time to look into it myself, but i'm happy to bet, though I'd probably want to revise my betting probability.)
I'm happy to bet on "By the end of 2034, does Tsvi think that it's >60% likely that orcas could do superhuman science if they had similar quality and quantity of science education as scientists and were motivated for this, conditional on Tsvi having talked to me for at least 2 hours about this {sometime in 2030-2034}/{when I might have more evidence}?"
I'd currently be at like 30%26% on this, though if you take more time to think about it I might adjust this estimate I am willing to bet on.
I'm happy to bet up to 200$ per bit (or maybe more but would have to think about it). Aka if it's resolved "Yes", money flowing from you to me would be (and if it's resolved "No" it would be ). (Where negative money flow indicates flow into the other direction.)
(Also obviously you'd need to commit to talking to me for 2h sometime when i have more evidence, and not just avoid resolution by not talking to me.)
I don't know what you mean by this:
The thing is, there's probably gonna be like ten other posts in the reference class of this post, and they just... don't leave much of a dent in things?
- I don't think it makes that much sense to just look at cortical neuron counts. Big bodies ask for many neurons, including cortical motor neurons. Do cetaceans have really big motor cortices? Visual cortices? Olfactory bulbs? Keyword "allometry". Yes, brains are plastic, but that doesn't mean orcas are actually ever doing higher mathematics with their brains.
See this comment.
- Scale matters, but I doubt it's very close to being the only thing! Humans likely had genetic adaptations for neuroanatomical phenotypes selected-for by some of: language; tool-making; persisting transient mental content; intent-inference; intent-sharing; mental simulation; prey prediction; deception; social learning; teaching; niche construction/expansion/migration. Orcas have a few of these. But how many, how much, for how long, in what range of situations and manifestations?
I already considered this. (I just posted a question about this.) I don't have good information on to what extent orcas have those, but my guesses are already reflected in my overall guess in the post.
Why do you think orcas have few of those? For me it seems plausibe that orcas have everything except tool use and niche construction.
I do think there was some significant selection for some kind of intelligence in dolphins and orcas - the main question here is whether being optimized on tool use (IF that was a significant driver in what selected humans for intelligence) would be significantly more useful for having the brain potential generalize to doing science than if the brains were optimized because of social dynamics or hunting strategies.
But of course there are other considerations like "maybe you need fully recursive language to be able to have the abstract reasoning take off, and this might very well come from some adaptations that are not just about neoron counts, and maybe orcas don't have that".
I already took all my current uncertain consideration on this into account when I said "50% that they would be superhuman at science if they had similar quality and quantity of science education as scientists and were motivated for this".
Or do you think a cow brain scaled to 40 billion neurons would be superhuman?
I don't know what you're asking for here. I don't think organisms end up with 40 billion cortical neurons without either some strong selection for at least some sub-dimensions of intelligence, or being as big as Godzilla.
I'm not really excited about just smushing together more brain tissue without it having been optimized to work well together, but orca brains were optimized.
- Culture matters. The Greeks could be great philosophers... But could a kid living in 8000 BCE, who gets to text message with an advanced alien civilization of kinda dumb people, become a cutting edge philosopher in the alien culture? Even though almost everyone ze interacts with is preagricultural, preliterate? I dunno, maybe? Still seems kinda hard actually?
Yep that's why I'm only at like 15% that we get very significant results out of it in the next 30 years even if we tried hard. (aka 30% conditional on orcas being smart enough.)
- Superbabies is good. It would actually work. It's not actually that hard. There's lots of investment already in component science/tech. Orcas doesn't scale. No one cares about orcas. There's not hundreds of scientists and hundreds of millions in orca communications research. Etc. The sense of this plan being weird is a good sense to investigate further. It's possible for superficial weirdness to be wrong, but don't dismiss the weirdness out of hand.
I mean if Orcas are smarter they might be super vastly smarter so you wouldn't need that many.
Superbabies would work well given multiple generations but also only like 30% that we'd get +7std humans born within 10 years even if we tried similarly hard[1], and I think it's pretty unlikely we have more than 40 years left without strong governance success. (E.g. afaik we still have problems cloning primates well (even though it's been a thing for long) and those are just sub-difficulties[2] of e.g. creating superbabies through repeated embryo selection.)
I think number of neurons in neocortex (or even more prefrontal cortex - but unfortunately i didn't quickly find how big the orca prefrontal cortex is - though I'd guess it to still be significantly bigger than for humans) is a much much better proxy for intelligence of species than brain size (or encephalization quotient). (E.g. see the wikipedia list linked in my question here.)
(Also see here. There are more examples, e.g. a Blue and yellow macaw has 1.9 billion, whereas brown bears have only 250million.)
EDIT: Tbc I do think that larger bodies require more neurons in touch-sense and motor parts of the neocortex, so there is some effect of how larger animals need a bit larger brains to be similarly smart, but I don't think this effect is very strong.
But yeah there are other considerations too, which is why I am only at 50% that orcas could do science significantly better than humans if they tried.
I feel like life-force seems like a sensation that's different from what I'd expect from just having a thing in the world model with inherent surprisingness and ends-without-trajectory-predictions/"optimizerness" attached. ("Life-force" sounds more like "as if the thing had a soul" to me. I do not understand where this comes from but I don't see how I'd predict such a sensation in advance given just the inherent-surprisingness + optimizerness hypothesis.)
Thanks for communicating your model well again!
I think we might mostly agree, but let's clarify.
I agree with all of:
In the course of predicting them well, the world-model invents some slightly-higher-level concept (or family of closely-interlinked concepts) that we call “cold”. And it notices and memorizes predictively-useful relationships between this new “cold” concept and other things in the world-model, e.g. shivering and ice.
I don’t think there’s more to the concept “cold” than the sum total of its associations with every other concept, with sensory input, and with motor output.
I also basically agree with:
I like to draw the distinction between understanding learning algorithms and understanding trained models. The former is kinda like what you learn in an ML course (gradient descent, training data, etc.) , the latter is kinda like what you learn in a mechanistic interpretability paper. I don’t think it’s realistic to “write code” for the “cold” concept, because I think it (like all concepts) emerges at the trained model level. It emerges from a learning algorithm, training environment, loss function, etc.
I agree that fully writing code would be quite a daunting task. I think my phrasing of "write code" was not great. But it's already some reductionist progress if you have something like:
if coldness concept gets more activated: increase activation of shivering anticipation; weakly increase activation of snow concept; ...
I don't think it's a worthwhile exercise to get very precise.
An important point I wanted to make here is just that the meaning of "cold" comes from the interactions with other concepts, and there's no such thing as an inherent independent meaning of the word "cold". (So when I hear 'If we look at naturalistic visual inputs that directly or indirectly trigger C, and they’re disproportionately pictures of clocks, then that’s some evidence that C “means” clock.' this seems a bit off to me, though not too bad.)
I guess I best try to explain why I felt some unease with your initial description of the cold example:
Suppose somebody said:
There’s a certain kind of interoceptive sensory input, consisting of such-and-such signal coming from blah type of thermoreceptor in the peripheral nervous system. Your brain does its usual thing of transforming that sensation into its own “color” of “metaphysical paint” (as in §3.3.2) that forms a concept / property in your conscious awareness and world-model, and you know it by the everyday term “cold”.
On the one hand, I would defend this passage as basically true.
Basically I think that some people - though a priory not you - would think that sth like "i feel cold because the cold-thermorecepters activate the corresponding cold concept" explains their sense of cold. However, if you just take this hypothesis which basically is "some sensors activate some concept" without anything else, then the concept would be completely shapeless and uninterpretable - unrelated to anything known.
I now think you probably didn't mean it in a nearly that bad way but not sure.
(But some parts of what you write seem to me like you have slightly weaker sensors about "how does a hypothesis actually constrain my anticipations / concentrate probability mass" or "what would this hypothesis predict if I didn't already know how I perceive it", and I do think those sensors are useful.)
(I also think that there is some hypothalamus-or-so buisness logic for what responses to trigger (e.g. shivers) from significant cold input signals that would need to be figured out if you want to get a good model of freezing/feeling-uncomfortably-cold, but that's about freezing in particular and not temperature as a property we model on objects.)
The post is not joking. (But thanks for feedback that you were confused in this way!)
I basically didn't know much about orcas before I learned that they have 2.05 times 3 days ago as many neurons in the neocortex than humans, and then yesterday and the day before I spent looking into how smart orcas might be and evaluating the evidence. So I'm far from an expert but I also didn't run over strong evidence that they are dumber than hunter-gatherer humans and some weak-medium strong evidence that they are might at least as smart as 15 year olds. But it's still possible that orca researchers have observed orcas not finding some strategies they would've thought of or sth.
But yeah the only piece of evidence that they might be significantly smarter than humans is their brains. I consider it reasonably strong though.
ADDED:
I edited "Though I don't know that much about orcas" to "Though I only tried to form a model of orca intelligence for 1-2 days". thanks!
Hi Steve, I didn't read this post yet and just wanted to ask whether it's still worth reading or whether everything relevant is now better in "incentive learning and dead sea salt experiment"?
In the wikipedia list, the estimated number of neurons in the neocortex of a blue whale is 5 billion (compared to 43 billion in orcas), even though blue whales are much larger. (Unfortunately the blue whale estimate is just an estimate and not grounded in optical or isotropic fractionation measurements.)
(EDIT: Hm interesting, the linked reddit post mentions 15billion for blue whales. Not sure what is correct.)
I agree that memory and beliefs are in some sense optional addons. I don't understand precisely enough yet how we model animals.
On your section on cold:
First, I'm still not sure in what way you're using "cold" of the two interpretations I indicated here: "(where I assume you talk about mental-physiological reactions to freezing/feeling-cold (as opposed to modelling the temperature of objects))".
But in either case I mostly just mean that having a full reductionist explanation of e.g. cold is an extremely high standard that ought to fulfill the following criteria:
- You can replace the word "cold" and other related abstract words with some other token-sequences/made-up-words, and someone who had a sufficiently good understanding would still be able to figure out that the new made-up-word corresponds to the concept we call "cold".
- (Where I don't think your explanation had something in it where you couldn't just replace "cold" with "heat" or "redness" (except redness wouldn't work if we allow "thermoreceptor" but I'd also want to rename this to "receptor-type-abc".)
- You can sorta write code for a relevant part of what's happening in the mind when e.g. the freezing emotion/sensation is triggered.
- (Like you would not need to describe a fully conscious program, but the function that triggers how muscles contract and the sensation of wanting to curl up and the skin shivering and causes a negative hedonic tone as well as instantiating a subgoal of getting thermoreceptors to report higher temperature or sth. Like I'd count this description as a weak reductionist hypothesis (which makes progress on unpacking the "cold" concept but where there are more levels of unpacking to do), though it might be very incomplete and partially wrong.)
Like I'm not sure we disagree much here. I think everything you said is correct, but I feel like emphasizing that there are still more layers of understanding that need to get unpacked and that saying "it's a concept that's useful to predict sensory data" still leaves up open questions of what exactly the information is the concept has the ability to communicate or of how the concept relates to other concepts.
Thanks for being so wonderfully precise to make it easy for me to reply!
The part where you loose me is here:
Meanwhile, in our everyday experience, we all have an intuitive sense of animation / agency.
Where does this sense of agency come from? Likewise:
When we do this kind of analysis well, we’ll wind up describing every aspect of our actual everyday intuitions around animation / agency / alive-ness, and predicting all the items in §3.3.
How do we get from something seeming inherently surprising to something seeming agentic or embued with life-force?
EDITED TO ADD: Tbc I think you can explain agency (though not life-force, and you need to be carefuly to only interpret agency in this limited sense) through being able to predict outcomes without trajectories (as you also seem to have realized, as in "(derived from a pattern where I can make medium-term predictions despite short-term surprise)"). I wouldn't equate agency with inherent surprisingness though, although it often occurs together.
Hm interesting. I mean I'd imagine that if we get good heuristic guarantees for a system it would basically mean that all the not-perfectly-aligned subsystems/subsearches are limited and contained enough that they won't be able to engage in RSI. But maybe I misunderstand your point? (Like maybe you have specific reason to believe that it would be very hard to predict reliably that a subsystem is contained enough to not engage in RSI or so?)
(I think inner alignment is very hard and humans are currently not (nearly?) competent enough to figure out how to set up training setups within two decades. Like for being able to get good heuristic guarantees I think we'd need to at least figure out at least something sorta like the steering subsystem which tries to align the human brain, only better because it's not good enough for smart humans I'd say. (Though Steven Byrnes' agenda is perhaps a UANFSI approach that might have sorta a shot because it might open up possibilities of studying in more detail how values form in humans. Though it's a central example of what I was imagining when I coined the term.))
How bottlenecked is your agenda by philosophy skills (like being good at thought experiments for deriving stuff like UDT, or like being good at figuring out the right ontology for thinking about systems or problems) vs math skill vs other stuff?
Idk that could be part of finding heuristic arguments for desireable properties for what an UANFSI converges to. Possibly it's easier to provide probabilistic convergence guarantees for systems that don't do FSI so this would already give some implicit evidence. But we could also just say that it's fine if FSI happens as long as we have heuristic convergence arguments - like that UANFSI is just allowing for a broader class of algorithms which might make stuff easier - though i mostly don't expect we'd get FSI alignment through this indirect alignment path from UANFSI but that we'd get an NFSI AI if we get some probabilistic convergence guarantees.
(Also I didn't think much about it at all. As said I'm trying KANSI for now.)
Thanks!
(I don't fully understand yet what results your aiming for, but yeah makes sense that probabilistic guarantees make some stuff more feasible. Not sure whether there might be more relaxations I'd be fine to at least initially make.)
Thanks for writing up some of the theory of change for the tiling agents agenda!
I'd be curious on your take on the importance of the Löbian obstacle: I feel like it's important to do this research for aligning full-blown RSI-to-superintelligence, but at the same time it introduces quite some extra difficulty, and I'd be more excited about research (which ultimately aims for pivotal-act level alignment) where we're fine assuming some "fixed meta level" in the learning algorithm but general enough that the object-level AI can get very powerful. It seems to me that this might make it easier to prove/heuristically-argue-for that the AI will end up with some desirable properties.
Relatedly, I feel like on arbital there were the categories "RSI" and "KANSI", but AFAICT not clearly some third category like "unknown-algorithm non-full-self-improving (UANFSI?) AI". (Where IMO current deep learning clearly fits into the third category, though there might be a lot more out there which would too.) I'm currently working on KANSI AI, but if I didn't I'd be a bit more excited about (formal) UANFSI approaches than full RSI theory, especially since the latter seems to have been tried more. (E.g. I guess I'd classify Vanassa Kosoy's work as UANFSI, but I didn't look much at it yet.) (Also there can still be some self-improvement for UANFSI AIs, but as said there would be some meta level that would be fixed.)
But possible I strongly misunderstand something (e.g. maybe the Löbian obstacle isn't that central?).
(In any case I think there ought to be multiple people continuing this line of work.)
You could also ask the SERI MATS team whether they still accept you as a mentor for the coming winter cohort (and allow you to have extended application phase where people can apply until mid November or sth).
(I'd guess you might get slightly better candidates there, though I think applicant quality generally degraded somewhat since the first cohorts.) EDIT: I do not think this anymore.
Thanks!
Yeah I believe what you say about that long-distance connections not that many.
I meant that there might be more non-long-distance connections between neighboring areas. (E.g. boundaries of areas are a bit fuzzy iirc, so macrocolumns towards the "edge" of a region are sorta intertwined with macrocolumns of the other side of the "edge".)
(I thought when you mean V1 to V2 you include those too, but I guess you didn't?)
Do you think those inter-area non-long-distance connections are relatively unimportant, and if so why?
(I just read Kaj Sotala's "Subagents, trauma and rationality" post and thought I link it here because it's also saying interesting stuff about DID.)
Interregional connections (e.g. parietal lobe to prefrontal lobe, or V1 to V2) are fewer, and consistent enough between different people, and involve many fewer total connections, so they've all been pretty well described by modern neuroscience.
Wait are you saying that not only there is quite low long-distance bandwidth, but also relatively low bandwith between neighboring areas? Numbers would be very helpful.
And if there's much higher bandwidth between neighboring regions, might there not be a lot more information that's propagating long-range but only slowly through intermediate areas (or would that be too slow or sth?)?
(Relatedly, how crisply does the neocortex factor into different (specialized) regions? (Like I'd have thought it's maybe sorta continuous?))
(There might be a sorta annoying analysis one could do to test my hypothesis: On my hypothesis the correlation between the intelligence of very intelligent parents and their children would be even a bit less than on the just-independent-mutations hypothesis, because very intelligent people likely also got lucky in how their gene variants work together but those properties would unlikely to all be passed along and end up dominant.)
Thanks!
Is the following a fair paraphrasing of your main hypothesis? (I'm leaving out some subtleties with conjunctive successes, but please correct the model in that way if it's relevant.):
"""
Each deleterious mutation multiplies your probability of succeeding at a problem/thought by some constant. Let's for simplicity say it's 0.98 for all of them.
Then the expected number of successes per time for a person is proportional to 0.98^num_deleterious_mutations(person).
So the model would predict that when Person A had 10 less deleterious mutations than person B, they would on average accomplish 0.98^10 ~= 0.82 times as much in a given timeframe.
"""
I think this model makes a lot of sense, thanks!
In itself I think it's insufficient to explain how heavytailed human intelligence is -- there were multiple cases where Einstein seems to have been able to solve problems multiple times faster than the next runner ups. But I think if you use this model in a learning setting where success means "better thinking algorithms" then if you have 10 fewer deleterious mutations it's like having 1/0.82 longer training time, and there might also be compounding returns from having better thinking algorithms to getting more and richer updates to them.
Not sure whether this completely deconfuses me about how heavytailed human intelligence is, but it's a great start.
I guess at least the heavytail is much less significant evidence for my hypothesis than I initially thought (though so far I still think my hypothesis is plausible).
Yeah I know that's why I said that if a major effect was through few significantly deleterious mutations this would be more plausible. But i feel like human intelligence is even more heavitailed than what one would predict given this hypothesis.
If you have many mutations that matter, then via central limit theorem the overall distribution will be roughly gaussian even though the individual ones are exponential.
(If I made a mistake maybe crunch the numbers to show me?)
(initially misunderstood what you mean where i thought complete nonsense.)
I don't understand what you're trying to say. Can you maybe rephrase again in more detail?
(Thanks. I don't think this is necessarily significant evidence against my hypothesis (see my comment on GeneSmith's comment.)
Another confusing relevant piece of evidence I thought I throw in:
Human intelligence seems to me to be very heavytailed. (I assume this is uncontrovertial here, just look at the greatest scientists vs great scientists.)
If variance in intelligence was basically purely explained by mildly-delterious SNPs, this would seem a bit odd to me: If the average person had 1000SNPs, and then (using butt-numbers which might be very off) Einstein (+6.3std) had only 800 and the average theoretical physics professor (+4std) had 850, I wouldn't expect the difference there to be that big.
It's a bit less surprising on the model where most people have a few strongly delterious mutations, and supergeniuses are the lucky ones that have only 1 or 0 of those.
It's IMO even a bit less surprising on my hypothesis where in some cases the different hyperparameters happen to work much better with each other -- where supergeniuses are in some dimensions "more lucky than the base genome" (in a way that's not necessarily easy to pass on to offspring though because the genes are interdependent, which is why the genes didn't yet rise to fixation). But even there I'd still be pretty surprised by the heavytail.
The heavytail of intelligence really confuses me. (Given that it doesn't even come from sub-critical intelligence explosion dynamics.)
Thanks.
So I only briefly read through the section of the paper, but not really sure whether it applies to my hypothesis: My hypothesis isn't about there being gene-combinations that are useful which were selected for, but just about there being gene-combinations that coincidentally work better without there being strong selection pressure for those to quickly rise to fixation.
(Also yeah for simpler properties like how much milk is produced I'd expect a much larger share of the variance to come from genes which have individual contributions. Also for selection-based eugenics the main relevant thing are the genes which have individual contribution. (Though if we have precise ability to do gene editing we might be able to do better and see how to tune the hyperparameters to fit well together.))
Please let me know whether I'm missing something though.
Thanks for confirming.
To clarify in case I'm misunderstanding, the effects are additive among the genes explaining the part of the IQ variance which we can so far explain, and we count that as evidence that for the remaining genetically caused IQ variance the effects will also be additive?
I didn't look into how the data analysis in the studies was done, but on my default guess this generalization does not work well / the additivity on the currently identified SNPs isn't significant counterevidence for my hyptohesis:
I'd imagine that studies just correlated individual gene variants with IQ and thereby found gene variants that have independent effects on intelligence. Or did they also look at pairwise or triplet gene-variant combinations and correlated those with IQ? (There would be quite a lot of pairs, and I'm not be sure whether the current datasets are large enough to robustly identify the combinations that really have good/bad effects from false positives.)
One would of course expect that the effects of the gene variants which have independent effects on IQ are additive.
But overall, except if the studies did look for higher-order IQ correlations, the fact that the IQ variance we can explain so far comes from genes which have independent effects isn't significant evidence for the remaining genetically-caused IQ variation also comes from gene variants which have independent effects, because we were bound to much rather find the genes which do have independent effects.
(I think the above should be sufficient explanation of what I think but here's an example to clarify my hypothesis:
Suppose gene A has variants A1 and A2 and gene B has B1 and B2. Suppose that A1 can work well with B1 and A2 with B2, but the other interactions don't fit together that well (like badly tuned hyperparameters) and result in lower intelligence.
When we only look at e.g. A1 and A2, none is independently better than the other -- they are uncorrelated to IQ. Studies would need to look at combinations of variants to see that e.g. A1+B1 has slight positive correlation with intelligence -- and I'm doubting whether studies did that (and whether we have sufficient data to see the signal among the combinatorical explosion of possibilities), and it would be helpful if someone clarified to me briefly how studies did the data analysis.
)
By the same token, I’m happy to defend a claim along the lines of “intrinsic unpredictability is the seed / core at the center of concepts like animation, vitality, agency, etc.”
Well I don't think that intrinsic unpredictability explains the sense of lifeforce or whatever.
(What seems possible is that something like hard-to-predict (and purposeful?) behavior triggers human minds to model an object as interfaced to an invisible mind/soul/spirit, and the way humans model such souls is particular in some way which explains the sense of lifeforce.)
I think when humans model other minds (which includes animals (and gods)) they start from a pre-built template (potentially from mirroring part of their own cognitive machinery) with properties goals/desires, emotions, memory and beliefs.
When your dog dies the appearance of lifeforce disappearing might've been caused by seeing the dead body being now very predictable, but the explanation isn't that the sense of unpredictability went away, but rather something to do with that your whole model of the mind of your dog stopped having any predictive power. (I don't know yet what exactly might cause the sense of life force.)
Tbc what I'm imagining when saying "intrinsic unpredictability" is a reductionist model of how some machinery in the mind works, sorta like the model that explains frequentist[1] intuitions that a coin has an inherent 50% probability to come up heads. (I do NOT mean that an "intrinsic unpredictability" tag gets attached to the object which then needs to get interpreted by some abstract-modelling machinery.)
As example for a reductionist explanation, consider the frequentist intuition that it is a fact about the world that a coin comes up heads with 50% probability. This can be explained by saying that such agents model the world as probabilistic environment with P(coin=heads)=50%. (As opposed to worlds as deterministic environments where the oucome of an experiment is fixed and then having probabilistic uncertainty about what world one is in.)
I don't know precisely how to model "intrinsic unpredictability" yet, but if I'm looking for the part that explains why it seems unintuitive to us to think of ourselves (and others?) as deterministic, it could be that we model minds as "intrinsically probabilistic" just like in the coin case, or it might be a bit different like that a part predicts the model of the vitalistic object to get constantly updated as we observe it. (I previously didn't think about it clearly and had a slightly different guess how it might be implemented but it wasn't a full coherent picture that made sense.)
In case you do think that "intrinsic unpredictability" explains the sense of lifeforce, I think this is a mysterious answer.
Harry gasped for breath, "but what is going on? "
"Magic," said Professor McGonagall.
"That's just a word! Even after you tell me that, I can't make any new predictions! It's exactly like saying 'phlogiston' or 'elan vital' or 'emergence' or 'complexity'!"
(chapter 6, HPMoR)
(To clarify: Even though it is magic, I think Harry is correct here that it's not an explanation.)
Also, I think you're aware of this, but nothing is inherently meaningful; meaning can only arise through how something is relative to something else. In the cold case (where I assume you talk about mental-physiological reactions to freezing/feeling-cold (as opposed to modelling the temperature of objects)), the meaning of "cold" comes from the cluster of sensations it refers to and how it affects considerations. If you just had the information "type-ABC (aka 'cold') sensors fired at position-XYZ", the rest of the mind wouldn't need to know what to do with that information on it's own but it needs some circutry to relate the information to other events. So I wouldn't say what you wrote explains cold, but maybe you didn't think it did.
- ^
I might not be fair to frequentists and don't really know their models. I just don't know how else to easily call it because it seems some people like Eliezer might not have had such intuitions.
So what's missing?
I haven't looked at any of the studies and also don't know much about genomics so my guess might be completely wrong, but a different hypothesis that seems pretty plausible to me is:
Most of the variance of intelligence comes from how well different genes/hyperparamets-of-the-brain can work together, rather than them having individually independent effects on intelligence. Aka e.g. as made-up specifc implausible example (I don't know that much neuroscience), there could be different genes controlling the size, the snapse-density, and the learning/placticity-rate of cortical columns in some region and there are combinations of those hyperparameters which happen to work well and some that don't fit quite as well.
So this hypothesis would predict that we didn't find the remaining genetic component for intelligence yet because we didn't have enough data to see what clusters of genes together have good effects and we also didn't know in what places to look for clusters.
This is a nitpick, but I think you’re using the word “pro-social” when you mean something more like “doing socially-endorsed things”. For example, If a bully is beating up a nerd, he’s impressing his (bully) friends, and he’s acting from social motivations, and he’s taking pride in his work, and he’s improving his self-image and popularity, but most people wouldn’t call bullying “pro-social behavior”, right?
Agreed.
Incidentally, I think your description is an overstatement. My claim is that “the valence our "best"/pro-social selves would ascribe” is very relevant to the valence of self-reflective thoughts, to a much greater extent than non-self-reflective thoughts. But they’re not decisive. That’s what I was suggesting by my §2.5.2 example of “Screw being ‘my best self’, I’m tired, I’m going to sleep”.
Also agreed.
Re your reply to my first question:
I think that makes sense iiuc. Does the following correction to my model seem correct?:
I was thinking of it like "self reflective thoughts have some valence ---causes---> model of homunculus gets described as wanting those things where self-reflective thoughts have positive valence". But actually your model is like "there are beliefs about what the model of the homunculus wants ---causes---> self-reflective thoughts to get higher valence if they fit to what the homunculus wants". (Where I think for many people the "what the homunculus wants" is sorta a bit editable and changes in different situations depending on what subagents are in control.)
Re your reply to my second question:
So I'm not sure what your model is, but as far as I understand it seems like the model says "valence of S(X) heavily depends on what homunculus wants" and "what homunculus wants is determined by what goals there is sophisticated brainstorming towards, which are the goals where S(X) is positive valence". And it's possible that such a circularity is there, but that alone doesn't explain to me why the homunculus' preferences usually end up in the "socially-endorsed" attactor.
I mean another way to phrase the question might be "why are there a difference between ego-syntonic and positive valence? why not just one thing?". And yeah it's possible that the answer here doesn't really require anything new and it's just that the way valence naturally is coded in our brain is stupid and incoherent and the homunculus-model has higher consistency pressure which straightens out the reflectively endorsed values to be more coherent and in particular neglects myopic high-valence urges.
And that the homunculus-model ends up with socially-endorsed preferences because modelling what thoughts come up in the mind is pretty intertwined with language and it makes sense that for language-related thoughts the "is this socially endorsed" thought accessors are particularly strong. Not sure whether that's the whole story though.
(Also I think ego-dystonic goals can sometimes still cause decently sophisticated brainstorming, especially if it comes from urges that other parts try to suppress and thus learn to "hide their thoughts". Possibly related is that people often rationalize about why to do something.)
Anyway, I think there’s an innate drive to impress the people who you like in turn. I’ve been calling it the drive to feel liked / admired. It is certainly there for evolutionary reasons, and I think that it’s very strong (in most people, definitely not everyone), and causes a substantial share of ego-syntonic desires, without people realizing it. It has strong self-reflective associations, in that “what the people I like would think of me” centrally involves “me” and what I’m doing, both right now and in general. It’s sufficiently strong that there tends to be a lot of overlap between “the version of myself that I would want others to see, especially whom I respect in turn” versus “the version of myself that I like best all things considered”.
I think that’s similar to what you’re talking about, right?
Yeah sorta. I think what I wanted to get at is that it seems to me that people often think of themselves as (wanting to be) nicer than their behavior would actually imply (though maybe I overestimated how strong that effect is) and I wanted to look for an explanation why.
(Also I generally want to get a great understanding of what values end up being reflectively endorsed and why -- this seems very important for alignment.)
Sorry about your dog.
So I agree that there's this introspective sense that feels more like something one would call "vitalistic force". However I do not think that all of the properties of how we experience animals comes from just our mind attaching "inherent surprisingness". Rather humans have a tendency to model animals or so as there being some mind/soul stuff which probably entails more than just agency and inherent surprisingness, though I don't know what precisely.
Like if you say that vitalistic force is inherent surprisingness, and that vitalistic force is explaining the sense of vitality or life force we see in living creatures, you're sneaking in connotations. "vitalistic force" is effectively a mysterious answer for most of the properties you're trying to explain with it. (Or a placeholder like "the concept I don't understand yet but I give it a name".)
(Like IMO it's important to recognize that saying "inherent-surprisingness/vitalistic-force my mind paints on objects explains my sense of animals having life-force" is not actually a mechanistic hypothesis -- I would not advance-predict a sense of life-force from thinking that minds project their continuous surprise about an object as a property on the object itself. Not sure whether you're making this mistake though.)
(I guess my initial interpretation was correct then. I just later wrongly changed my interpretation because for your free will reduction all that is needed is the thing that the vitalistic-force/inherent-surprisingless hypothesis does in fact properly explain.)
I mostly expect you start getting more and more into sub-critical intelligence explosion dynamics when you exceed +6std more and more. (E.g. see second half of this other comment i wrote) I also expect very smart people will be able to better setup computer-augmented note organizing systems or maybe code narrow aligned AIs that might help them with their tasks (in a way it's a lot more useful than current LLMs but hard to use for other people). But idk.
I'm not sure how big the difference between +6 and +6.3std actually is. I also might've confused the actual-competence vs genetical-potential scale. On the scale I used the drive/"how hard one is trying" also plays a big role.
I actually mostly expect this from seeing that intelligence is pretty heavitailed. E.g. alignment research capability seems incredibly heavitailed to me, though it might be hard to judge the differences in capability there if you're not already one of the relatively few people who are good at alignment research. Another example is how Einstein managed to find general relativity where the combined rest of the world wouldn't have been able to do it like that without more experimental evidence.
I do not know why this is the case. It is (very?) surprising to me. Einstein didn't even work on understanding and optimizing his mind. But yeah that's how I guess.
(also quick feedback in case it's somehow useful: When I reread the post today I was surprised that by "vitalistic force" you really just mean inherent unpredictability and not more. You make this pretty clear, so I feel like me thinking some time after the first read that you meant it to explain more (which you don't) is fully on me, but still I think it might've been easier to understand if you had just called it "inherent unpredictability" instead of "vitalistic force".)
I feel like I'm still confused on 2 points:
- Why is, according to your model, the valence of self-reflective thoughts sorta the valence our "best"/pro-social selves would ascribe?
- Why does the homunculus get modeled as wanting pro-social/best-self stuff (as opposed to just what overall valence would imply)?
(I'd guess that there was evolutionary pressure for a self-model/homunculus to seem more pro-social as the overall behavior (and thoughts) of the human might imply, so I guess there might be some particular programming from evolution into that direction. I don't know how exactly it might look like though. I also wouldn't be shocked if it's mostly just like all the non-myopic desires are pretty pro-social and the self-model's values get straightened out in a way the myopic desires end up dropped because that would be incoherent. Would be interested in hearing your model on my questions above.)