Due to the generosity of ARIA, we will be able to offer a refund proportional to attendance, with a full refund for completion. The cost of registration is $200, and we plan to refund $25 for each week attended, as well as the final $50 upon completion of the course. We’ll ask participants to pay the registration fee once the cohort is finalized, so no fee is required to fill out the application form below.
Wait so do we get a refund if we decide we don't want to do the course, or if we manage to complete the course?
Like is it a refund in the "get your money back if you don't like it" sense, or is it incentive to not sign up and then not complete the course?
Nice post!
My key takeaway: "A system is aligned to human values if it tends to generate optimized-looking stuff which is aligned to human values."
I think this is useful progress. In particular it's good to try to aim for the AI to produce some particular result in the world, rather than trying to make the AI have some goal - it grounds you in the thing you actually care about in the end.
I'd say the "... aligned to human values part" is still underspecified (and I think you at least partially agree):
- "aligned": how does the ontology translation between the representation of the "generated optimized-looking stuff" and the representation of human values look like?
- "human values"
- I think your model of humans is too simplistic. E.g. at the very least it's lacking a distinction like between "ego-syntonic" and "voluntary" as in this post, though I'd probably want a even significantly more detailed model. Also one might need different models for very smart and reflective people than for most people.
- We haven't described value extrapolation.
- (Or from an alternative perspective, our model of humans doesn't identify their relevant metapreferences (which probably no human knows fully explicitly, and for some/many humans it they might not be really well defined).)
Positive reinforcement for first trying to better understand the problem before running off and trying to solve it! I think that's the way to make progress, and I'd encourage others to continue work on more precisely defining the problem, and in particular on getting better models of human cognition to identify how we might be able to rebind the "human values" concept to a better model of what's happening in human minds.
Btw, I'd have put the corrigibility section into a separate post, it's not nearly up to the standards of the rest of this post.
To set expectations: this post will not discuss ...
Maybe you want to add here that this is not meant to be an overview of alignment difficulties, or an explanation for why alignment is hard.
Agree on that people focus a bit too much on scheming. It might be good for some people to think a bit more about the other failure modes you described, but the main thing that needs doing is very smart people making progress towards building an aligned AI, not defending against particular failure modes. (However, most people probably cannot usefully contribute to that, so maybe focusing on failure modes is still good for most people. Only that in any case there's the problem that people will find proposals that very likely don't actually work but which people can rather believe in that they work, and thereby making an AI stop a bit less likely.)
In general, I wish more people would make posts about books without feeling the need to do boring parts they are uninterested in (summarizing and reviewing) and more just discussing the ideas they found valuable. I think this would lower the friction for such posts, resulting in more of them. I often wind up finding such thoughts and comments about non-fiction works by LWers pretty valuable. I have more of these if people are interested.
I liked this post, thanks and positive reinforcement. In case you didn't already post your other book notes, just letting you know I'd be interested.
Do we have a sense for how much of the orca brain is specialized for sonar?
I don't know.
But evolution slides functions around on the cortical surface, and (Claude tells me) association areas like the prefrontal cortex are particularly prone to this.
It's particularly bad for cetaceans. Their functional mapping looks completely different.
Thanks. Yep I agree with you, some elaboration:
(This comment assumes you at least read the basic summary of my project (or watched the intro video).)
I know of Earth Species Project (ESP) and CETI (though I only read 2 publications of ESP and none of CETI).
I don't expect them to succeed in something equivalent to decoding orca language to an extent that we could communicate with them almost as richly as they communicate among each other. (Though like, if long-range sperm whales signals are a lot simpler they might be easier to decode.)
From what I've seen, they are mostly trying to throw AI at stuff and hoping somehow they will understand stuff, without having a clear plan how to actually decode it. The AI stuff might look advanced but it's sorta obvious things to try and I think it's unlikely to work very well, though still glad they are trying this.
If you look at orca vocalizations, it looks complex and alien. The patterns we can currently recognize there look very different from what we'd be able to see in an unknown human language. The embedding mapping might be useful if we had to decode a human language, and maybe we still learn some useful stuff from it, but for orca language we don't even know what their analog of words and sentences are and maybe their language works even somewhat differently (though I'd guess if they are smarter than humans there's probably going to be something like words and sentences - but they might be encoded differently in the signals than in human languages).
Though definitely plausible that AI can help significantly with decoding animal languages, but I think it also needs forming deep understanding of some things and I think it's likely too hard for ESP to succeed anytime soon, though like possible a supergenius could do it in a few years, but it would be really impressive.
My approach may fail, especially if orcas aren't at least roughly human-level smart, but it has the advantage that we can show orcas precise context of what some words and sentences mean, whereas we basically have almost no context data on recordings of orca vocalizations, so it's easier for them to see what some signals mean than for humans to infer what orca vocalizations mean. (Even if we had a lot of video datasets with vocalizations (which we don't), it's still a lot less context information about what they are talking about, than if they could show us images to indicate what they would talk about.) Of course humans have more research experience and better tools for decoding signals, but it doesn't look to me like anyone is currently remotely close, and my approach is much quicker to try and might have at least a decent chance. (I mean it nonzero worked with bottlenose dolphins (in terms of grammar better than with great apes), though I'd be a lot more ambitious.)
Of course, the language I create will also be alien for orcas, but I think if they are good enough at abstract pattern recognition they might still be able to learn it.
Perhaps also not what you're looking for, but you could check out the google hashcode archive (here's an example problem). I never participated though, so don't know whether they would make that great tests. But it seems to me like general ad-hoc problem solving capabilities are more useful in hashcode than in other competetive programming competitions.
GPT4 summary: "Google Hash Code problems are real-world optimization and algorithmic challenges that require participants to design efficient solutions for large-scale scenarios. These problems are typically open-ended and focus on finding the best possible solution within given constraints, rather than exact correctness."
Maybe not what you're looking for because it's not like one hard problem but more like many problems in a row, and generally I don't really know whether they are difficult enough, but you could (have someone) look into Exit games. Those are basically like escape rooms to go. I'd filter for Age16+ to hopefully filter for the hard ones, though maybe you'd want to separately look up which are particularly hard.
I did one or two when I was like 15 or 16 years old, and recently remembered them and I want to try some more for fun (and maybe also introspection), though I didn't get around to it yet. I think they are relatively ad-hoc puzzles though as with basically anything you can of course train to get good at Exit games in particular by practicing. (It's possible that I totally overestimate the difficulty and they are actually more boring than I expect.)
(Btw, probably even less applicable to what you are looking for, but CondingEscape is also really fun. Especially the "Curse of the five warriors" is good.)
I hope I will get around to rereading the post and edit this comment to write a proper review, but I'm pretty busy, so in case I don't I now leave this very shitty review here.
I think this is probably my favorite post from 2023. Read the post summary to see what it's about.
I don't remember a lot of the details from the post and so am not sure whether I agree with everything, but what I can say is:
- When I read it several months ago, it seemed to me like an amazingly good explanation for why and how humans fall for motivated reasoning.
- The concept of valence turned out very useful for explaining some of my thought processes, e.g. when I'm daydreaming something and asking myself why, then for the few cases where I checked it was always something that falls into "the thought has high valence" - like e.g. imagining some situation where I said something that makes me look smart.
Another thought, though I don't actually have any experience with this, but mostly doing attentive silent listening/observing might also be useful for learning how the other person is doing research.
Like, if it seems boring to just observe and occasionally say sth, try to better predict how the person will think or so.
The mein reason I'm interested in orcas is because they have 43 billion cortical neurons, whereas the 2 land animals with the most cortical neurons (where we have have optical-fractionator measurements) are humans and chimpanzees with 21 billion and 7.4 billion respectively. See:
Pilot whales is the other species I'd consider for experiments - they have 37.2 billion cortical neurons.
For sperm whales we don't have data on neuron densities (though they do have the biggest brains). I'd guess they are not quite as smart though because they can dive long and they AFAIK don't use very collaborative hunting techniques.
Cool, thanks!
Cool, thanks, that was useful.
(I'm creating a language for communicating with orcas, so the phonemes will be relatively unpractical for humans. Otherwise the main criteria are simple parsing structure and easy learnability. (It doesn't need to be super perfect - the perhaps bigger challenge is to figure out how to teach abstract concepts without being able to bootstrap from an existing language.) Maybe I'll eventually create a great rationalist language for thinking effectively, but not right now.)
Is there some resource where I can quickly learn the basics of the Esperanto composition system? Somewhere I can see the main base dimensions/concepts?
I'd also be interested in anything you think was implemented particularly well in a (con)language.
(Also happy to learn from you rambling. Feel free to book a call: )
But most likely, this will all be irrelevant for orcas. Their languages may be regular or irregular, with fixed or random word order, or maybe with some categories that do not exist in human languages.
Yeah I was not asking because of decoding orca language but because I want inspiration for how to create the grammar for the language I'll construct. Esparanto/Ido also because I'm interested about how well word-compositonality is structured there and whether it is a decent attempt at outlining the basic concepts where other concepts are composites of.
Currently we basically don't have any datasets where it's labelled what orca says what. When I listen to recordings, I cannot distinguish voices, though idk it's possible that people who listened a lot more can. I think just unsupervised voice clustering would probably not work very accurately. I'd guess it's probably possible to get data on who said what by using an array of hydrophones to infer the location of the sound, but we need very accurate position inference because different orcas are often just 1-10m distance from each other, and for this we might need to get/infer decent estimates of how water temperature varies by depth, and generally there have not yet been attempts to get high precision through this method. (It's definitely harder in water than in air.)
Yeah basically I initially also had rough thoughts into this direction, but I think the create-and-teach language way is probably a lot faster.
I think the earth species project is trying to use AI to decode animal communication, though they don't focus on orcas in particular, but many species including e.g. beluga whales. Didn't look into it a lot but seems possible I could do sth like this in a smarter and more promising way, but probably still would take long.
Thanks for your thoughts!
I don't know what you'd consider enough recordings, and I don't know how much decent data we have.
I think the biggest datasets for orca vocalizations are the orchive and the orcasound archive. I think they each are multiple terabytes big (from audio recordings) but I think most of it (80-99.9% (?)) is probably crap where there might just be a brief very faint mammal vocalization in the distance.
We also don't have a way to see which orca said what.
Also orcas from different regions have different languages, and orcas from different pods different dialects.
I currently think the decoding path would be slower, and yeah the decoding part would involve AI but I feel like people just try to use AI somehow without a clear plan, but perhaps not you.
What approach did you imagine?
In case you're interested in few high-quality data (but still without annotations):
I think LTFF would take way too long to get back to me though. (Also they might be too busy to engage deeply enough to get past the "seems crazy" barrier and see it's at least worth trying.)
Also btw I mostly included this in case someone with significant amounts of money reads this, not because I want to scrap it together from small donations. I expect higher chances of getting funding come from me reaching out to 2-3 people I know (after I know more about how much money I need), but this is also decently likely to fail. If this fails I'll maybe try Manifund, but would guess I don't have good chances there either, but idk.
Actually out of curiosity, why 4x? (And what exactly do you mean by "2x larger"?) (And is this for a naive algorithm which can be improved upon or a tight constraint?)
Thanks for pointing that out! I will tell my friends to make sure they actually get good data for the metabolic cost and not just use cortical neuron count as proxy if they cannot find something good.
(Or is there also another point you wanted to make?) And yeah it's actually also an argument for why orcas might be less intelligent (if they sorta use their neurons less often). Thanks.
My guess is that there probably aren't a lot of simple mutations which just increase intelligence without increasing cortical neuron count. (Though probably simple mutations can shift the balance between different sub-dimensions of intelligence as constrained through cortical neuron count.) (Also of course any particular species has a lot of deleterious mutations going around and getting rid of those may often just increase intelligence, but I'm talking about intelligence-increasing changes to the base genome.)
But there could be complex adaptations that are very important for abstract reasoning. Metacognition and language are the main ones that come to mind.
So even if the experiment my friends to will show that the number of cortical neurons is a strong indicator, it could still be that humans were just one of the rare cases which evolved a relevant complex adaptation. But it would be significant evidence for orcas being smarter.
An argument against orcas being more intelligent than humans runs thus: Orcas are much bigger than humans, so the fraction of the metabolic cost the brain consumes is smaller than in humans. Thus it took more selection pressure for humans to evolve having 21billion neurons than for orcas to have 43billion.[1] Thus humans might have other intelligence-increasing mutations that orcas didn't evolve yet.
So the question here is "how much does scale matter vs other adaptations". Luckily, we can get some evidence on that by looking at other species and rating how intelligent they are and correlating that with (1) number of cortical neurons and (2) fraction of metabolic cost the brain uses, to see how strong of an indicator each is for intelligence.
I have two friends who are looking into this for a few hours on the side (where one tries to find cortical neurons and metabolic cost data, and the other looks at animal behavior to rate intelligence (without knowing about neuron count or so)). It'll be rather a crappy estimate but hopefully we at least have some evidence from this in a week.
- ^
Of course metabolic cost doesn't necessarily need to be linear in the number of cortical neurons, but it'd be my default guess, and in any case I don't think it matters for gathering evidence across other species as long as we can directly get data on the fraction of the metabolic cost the brain uses (rather than estimating it through neuron count).
Another thought:
In what animals would I on priors expect intelligence to evolve?
- Animals which use collaborative hunting techniques.
- Large animals. (So the neurons make up a smaller share of the overall metabolic cost.)
- Animals that can use tools so they benefit more from higher intelligence.
- (perhaps some other stuff like cultural knowledge being useful, or having enough slack for intelligence increase from social dynamics being possible.)
AFAIK, orcas are the largest animals that use collaborative hunting techniques.[1] That plausibly puts them second behind humans for where I would expect intelligence to evolve. So it doesn't take that much evidence for me to be like "ok looks like orcas also fell into some kind of intelligence attractor".
- ^
Though I heard sperm whales might sometimes collaborate too, but not nearly that sophisticated I guess. But I also wouldn't be shocked if sperm whales are very smart. They have the biggest animal brains, but I don't whether the cortical neuron count is known.
Main pieces I remember were: Orcas already dominating the planet (like humans do), large sea creatures going extinct due to orcas (similar to how humans drove several species extinct, (Megalodon? Probably extinct for different reasons, weak evidence against? Most other large whales are still around)).
To clarify for other readers: I do not necessarily endorse this is what we would expect if orcas were smart.
(Also I read somewhere that apparently chimpanzees sometimes/rarely can experience menopause in captivity.)
If the species is already dominating the environment then the pressure from the first component compared to the second decreases.
I agree with this. However I don't think humans had nearly sufficient slack for most of history. I don't think they dominated the environment up until 20000years [1]ago or so, and I think most improvements in intelligence come from earlier.
That's why I'm attributing the level of human intelligence in large part to runaway sexual selection. Without it, as soon as interspecies competition became the most important for reproductive success, natural selection would not push for even grater intelligence in humans, even though it could improve our ability to dominate the environment even more.
I'm definitely not saying that group selection lead to intelligence in humans (only that group selection would've removed it over long timescales if it wasn't useful). However I think that there were (through basically all of human history) significant individual fitness benefits from being smarter that did not come from outwitting each other, e.g. being better able to master hunting techniques and thereby gaining higher status in the tribe.
- ^
Or could also be 100k years, idk
I'm not sure how it's relevant.
I thought if humans were vastly more intelligent than they needed to be they would already learn all the relevant knowledge quickly enough so they reach their peak in the 20s.
And if the trait, the runaway sexual selection is propagating, is itself helpful in competition with other species, which is obviously true for intelligence, there is just no reason for such straightening over a long timescale.
I mean for an expensive trait like intelligence I'd say the benefits need to at least almost be worth the costs, and then I feel like rather attributing the selection for intelligence to "because it was useful" rather than "because it was a runaway selection".
(For reference I think Tsvi and GeneSmith have much more relevant knowledge for evaluating the chance of superbabies being feasible and I updated my guess to like 78%.)
(As it happens I also became more optimistic about the orca plan (especially in terms of how much it would cost and how long it would take, but also a bit in how likely I think it is that orcas would actually study science) (see footnote 4 in post). For <=30y timelines I think the orca plan is a bit more promising, though overall the superbabies plan is more promising/important. I'm now seriously considering pivoting to the orca plan though.) (EDIT: tbc I'm considering pivoting from alignment research, not superbaby research.)
(haha cool. perhaps you could even PM Abram if he doesn't PM you. I think it would be pretty useful to speed up his agenda through this.)
I agree that sexual selection is a thing - that it's the reason for e.g. women sometimes having unnecessarily large breasts.
But I think it gets straightened out over long timescales - and faster the more expensive the trait is. And intelligence seems ridiculously expensive in terms of metabolic energy our brain uses (or childbirth motality).
A main piece that updated me was reading anecdotes in Scott Alexander's Book review of "The Secret of our success" where I now think that humans did need their intelligence for survival. (E.g. 30 year old hunter gatherers perform better at hunting etc than hunter gatherers in their early 20s, even though the latter are more physically fit.)
A few more thoughts:
It's plausible that for both humans and orcas the relevant selection pressure mostly came from social dynamics, and it's plausible that there were different environmental pressures.
Actually my guess would be that it's because intelligence was environmentally adaptive, because my intuitive guess would be that group selection[1] is significant enough over long timescales which would disincentivize intelligence if it's not already (almost) useful enough to warrant the metabolic cost, unless the species has a lot of slack.
So an important question is: How adaptive is high intelligence?
In general I would expect that selection pressure for intelligence was significantly stronger in humans, but maybe for orcas it was happening over a lot longer time window, so the result for orcas could still be more impressive.
From what I observed about orca behavior I'd perhaps say a lower bound of their intelligence might roughly be like human 15 year olds or so. So up to that level of intelligence there seem to be benefits that allow orcas to use more sophisticated hunting techniques.
But would it be useful for orcas to be significantly smarter than humans? My prior intuition would've been that probably not very much.
But I think observing the impressive orca brains mostly screens this off: I wouldn't have expected orcas to evolve to be that smart, and I similarly strongly wouldn't have expected them to have that impressive brains, and seeing their brains updates me that there had to be some selection pressure to produce that.
But the selection pressure for intelligence wouldn't have needed to be that strong compared to humans for making the added intelligence worth the metabolic cost, because orcas are large and their neurons make up a much smaller share of their overall metabolic consumption. (EDIT: Actually (during some (long?) period of orca history) selection pressure for intelligence also would've needed to be stronger than selection pressure for other traits (e.g. making muscles more efficient or whatever).)
And that there is selection pressure is not totally implausible in hindsight:
- Orcas hunt very collaboratively, and maybe there are added benefits from coordinating their attacks better. (Btw, orcas live in matrilines, and I'd guess that from an evolutionary perspective the key thing to look at is how well a matriline performs, not individuals, but not sure. So there would be high selection for within-matriline cooperation (and perhaps communication!).)
- Some/(many?) Orca sub-species prey on other smart animals like dolphins or whales, and maybe orcas needed to be significantly smarter to be able to outwit the defensive mechanisms they learn to adapt.
But overall I know way too little about orca hunting techniques to be able to evaluate those.
ADDED 2024-11-29:
To my current (not at all very confident) knowledge, orcas split of from other still alive dolphin species 5-10million years ago (so sorta similar to humans - maybe slightly longer for orcas). So selection pressure must've been relatively strong I guess.
Btw, bottlenose dolphins (which have iirc 12.5 billion cortical neurons) are to orcas sorta like chimps are to humans. One could look how smart bottlenose dolphins are compared to chimps.
(There are other dolphin species (like pilot whales) which are probably smarter than bottlenose dolphins, but those aren't studied more than orcas, whereas bottlenose dolphins are.)
- ^
I mean group selection that could potentially be on a level of species where species go extinct. Please lmk if that's actually called differently.
thanks. Can you say more about why?
I mean runaway sexual selection is basically H1, which I updated to being less plausible. See my answer here. (You could comment there why you think my update might be wrong or so.)
My prior intuitive guess would be that H1 seems quite a decent chunk more likely than H2 or H3.
Actually I changed my mind.
Why I thought this before: H1 seems like a potential runaway-process and is clearly about individual selection which has stronger effects than group selection (and it was mentioned in HPMoR).
Why I don't think this anymore:
- It would also be incredibly huge coincidence if intelligence mostly evolved because of social dynamics but happened to be useful for all sorts of other survival techniques hunters and gatherers use. See e.g. Scott Alexander's Book review of "The Secret of our success".
- If there was only individual benefits for intelligence but it was not very useful otherwise then over long timelines group selection[1] would actually select against smarter humans because their neurons would use up more metabolic energy.
However, there's a possibly very big piece of evidence for H3: Humans are both the smartest land animals and have the best interface for using tools, and that would seem like a suspicious coincidence.
I think this is not a coincidence but rather that tool use let humans fall into an attractor basin where payoffs of intelligence were more significant.
- ^
I mean group selection that could potentially be on a level of species where species go extinct. Please lmk if that's actually called differently.
(Major edits added on 2024-11-29.)
Some of my own observations and considerations:
Anecdotal evidence for orca intelligence
(The first three anecdotes were added 2024-11-29.)
- Orcas leading orca researcher on boat 15miles home through the fog. (See the 80s clip starting from 8:10 in this youtube video.)
- Orcas can use bait.
- An orca family hunting a seal can pretend to give up and retreat and when the seal comes out thinking it's safe then BAM one orca stayed behind to catch it. (Told by Lance Barrett-Lennard somewhere in this documentary.[1])
- Intimate cooperation between native australian
hunter gathererswhale hunters and orcas for whale hunting around 1900:,_New_South_Wales - Orcas being skillful at turning boats around and even sinking a few vessels[2][3]:
- Orcas have a wide variety of cool hunting strategies. (e.g. see videos (1, 2)). I don't know how this compares to human hunter gatherers. (EDIT: Ok I just read Scott Alexander's Book review of "The Secret of our success" and some anecdotes on hunter gatherers there seem much more impressive. (But also plausible to me that other orca hunting techniques are also more sophisticated than the examples but in ways it might not be legible to us.))
(ADDED 2024-11-10[4]: Tbc, while this is more advanced than I'd a priory expected from animals, the absence of observations of even more clearly stunning techniques is some counterevidence of orcas being smarter than humans. Though I also don't quite point to an example of what I'd expect to see if orcas were actually 250 IQ but what I don't observe, but I also didn't think for long and maybe there would be sth.)
(Mild counterevidence added 2024-12-02:)
- Btw it's worth noting that orcas do sometimes get tangled up in fishing gear or strand (and die of that), though apparently less frequently than other cetaceans, though didn't check precisely whether it's really less per individual.
- Worth noting that there are only 50000-100000 orcas in the world, which is less than for many other cetacean species, though not sure whether it's less in terms of biomass.
Orca language
(EDIT: Perhaps just skip this orca language section. Relevant is that orca language is definitely learned and not innate. Otherwise not much is known, except that we can eyeball the complexity of their calls. You could take a look by listening[5] here.[6] I'd say it seems very slightly less complex than in humans (though could be more) and much more complex than what is observed in other land animals.)
(Warning: Low confidence. What I say might be wrong.)
I didn't look deep into research into orca language (not much more than watching this documentary), my impression is that we don't know much yet.
Some observations:
- Orcas language seems to be learned, not innate. Different regions have different languages and dialects. Scientists seem to analogize it to how humans speak different languages in different countries.
- For some orca groups that were studied, scientists were able to cluster their calls into 23 or 24 different calls clusters, but still with significant variation of calls within a call cluster.
- (I do not know how tightly calls are clustered, or whether there often are outliers.)
- Orcas communicate a lot. (This might be wrong but I think they spend a significant fraction of their time socializing where they exchange multiple calls per minute.)
- (Orcas emit clicks and whistles. The clicks are believed to be for spacial navigation (especially in the dark), the whistles for communication.) (EDIT: Actually also pulsed calls, which I initially lumped in with whistles but are emitted in pulses. Those are probably the main medium of communication.)
I'd count (2) as some weakish evidence against orcas having as sophisticated language as humans, however not very strongly. Some considerations:
- Sentences don't necessarily need to be formed through having temporal sequences of words, but words could also be some different frequency signals or so which are then simultanously overlayed.
- (The different 24 call types could be all sorts of things. E.g. conveying what we convey through body language, facial expressions, and tone. Or e.g. different sentence structures. Idk.)
- Their language might be very alien. I only have shitty considerations here but e.g.:
- Orca language doesn't need to have at all similar grammar. E.g. could be something as far from our language as logic programming is, though in the end still not nearly that simple.
- Orcas might often describe situations in ways we wouldn't describe them. E.g. rather about what movements they and their prey executed or sth.
- Orcas might describe more precisely where in 3D water particular orcas and animals were located, and they might have a much more efficient encoding for that than if we tried to communicate this.
More considerations
The onlymain piece of evidence that makes me wonder whether orcas might actually be significantly smarter than humans is their extremely impressive brain. I think it's pretty strong though.
As mentioned, orcas have 2.05 times as many neurons in their neocortex as humans, and when I look through the wikipedia list (where I just trust measured[7] and not estimated values), it seems to be a decent proxy for how intelligent a species is.
There needs to be some selection pressure for why they have 160 times more neurons in their neocortex than e.g. brown bears (which weigh like 1/8th of an orca or so). Size alone is not nearly a sufficient explanation.
It's plausible that for both humans and orcas the relevant selection pressure mostly came from social dynamics, and it's plausible that there were different environmental pressures. (I'm keen to learn.) It's possible that caused humans to be smart more strongly incentivized our brains to be able to do abstract reasoning, whereas for orcas it might've been useful for some particular skills that generalize less well for doing other stuff.
If I'd only ever seen hunter gatherer humans, even if I could understand their language, I'm not sure I'd expect that species to be able to do science on priors. But humans are able to do it. Somehow our intelligence generalized far outside the distribution we were optimized on. I don't think that doing science is similar to anything we've been optimized on, except that advanced language might be necessary.
On priors I wouldn't really see significant reasons why whatever selection pressures optimized orcas to have their astounding brains, would make their intelligence generalize less well to doing science, than whatever selection pressures produced our impressive human brains.
One thing that would update me significantly downwards on orcas being able to do science is if their prefrontal cortex doesn't contain that many neurons. (I didn't find that information quickly so please lmk if you find it.) Humans have a very large prefrontal cortex compared to other animals. My guess would be that orcas have too, and that they probably still have >1.5 times as many neurons in their prefrontal cortex than humans, and TBH I even wouldn't be totally shocked if it's >2.5 times. (EDIT: The cortex of the cetacean brain is organized differently than in most mammals and AFAIK we currently cannot map functionality very well.)
(Read my comments below to see more thoughts.)
- ^
You might need a VPN to canada to watch it.
- ^
Btw there is no recorded case of a human having been killed by an orca in the wild, including when they needed to swim when the vessel was sunk. (Even though orcas often eat other mammals.) (I think I even once heard it mention that it seemed like the orcas made sure that no humans died from their attacks, though I don't at all know how active the role of the orcas was there (my guess is not very).)
- ^
I'd consider it plausible that they were trying to signal us to please stop fishing that much, but I didn't look nearly deeply enough into it to judge.
- ^
Actually I don't remember exactly when I added this. I still think it's true but to a weaker extent than I originally thought.
- ^
Or downloading the files and looking at the spectrogram in e.g. audacity.
- ^
If you want to take a deeper look, here are more recordings.
- ^
Aka optical or isotropic fractionator in the method column.
What's the ect? Or do you have links for where to learn more? (What's the name of the field?)
(I thought wikipedia would give me a good overview but your list was already more useful to me.)
Thanks. No I didn't. (And probably don't have time to look into it but still nice to know.)
Justification for this:
I don't think organisms end up with 40 billion cortical neurons without either some strong selection for at least some sub-dimensions of intelligence, or being as big as Godzilla.
One could naively expect that the neuron count (especially touch and motor) sensory processing modules are proportional to the surface area of an organism. However I think this is unrealistic: Bears don't need nearly as fine precision on what square centimeter of skin was touched (or what millimeter the paw moves) than mice, and generally this is because precision gets less relevant given body size.
So let's say the precision an organism needs is proportional to the square root of the 1-dimensional-size (aka sqrt(surface_area)) of the organism. Aka if a mice is 5cm tall and a bear 2m, the spacing between sensors on the mouse skin vs on the bear skin would be sqrt(0.05) vs sqrt(2). The number of sensors on the skin surface is proportional to the square of the distancing between sensors, so the overall number of sensors is proportional to the 1-dimensional-size (aka sqrt(surface_area)).
A brown bear has 250million neorons in the neocortex and is maybe 2m tall. So to get just by scaling size to 40billion neorons an organism would have to be 40/0.25 * 2m = 320m tall. So actually bigger than godzilla.
I don't think I'm the right person to look into this.
I just updated quickly via conservation of expected probability. (I agree though that I'd be a bit concerned about most people updating that quickly. If you think I've gone slightly psychotic please bet with me so I update harder if I notice you're right.)
(EDIT: actually it's sorta shitty because we might not get more evidence because I have even more important things to do and probably don't have time to look into it myself, but i'm happy to bet, though I'd probably want to revise my betting probability.)
I'm happy to bet on "By the end of 2034, does Tsvi think that it's >60% likely that orcas could do superhuman science if they had similar quality and quantity of science education as scientists and were motivated for this, conditional on Tsvi having talked to me for at least 2 hours about this {sometime in 2030-2034}/{when I might have more evidence}?"
I'd currently be at like 30%26% on this, though if you take more time to think about it I might adjust this estimate I am willing to bet on.
I'm happy to bet up to 200$ per bit (or maybe more but would have to think about it). Aka if it's resolved "Yes", money flowing from you to me would be (and if it's resolved "No" it would be ). (Where negative money flow indicates flow into the other direction.)
(Also obviously you'd need to commit to talking to me for 2h sometime when i have more evidence, and not just avoid resolution by not talking to me.)
I don't know what you mean by this:
The thing is, there's probably gonna be like ten other posts in the reference class of this post, and they just... don't leave much of a dent in things?
- I don't think it makes that much sense to just look at cortical neuron counts. Big bodies ask for many neurons, including cortical motor neurons. Do cetaceans have really big motor cortices? Visual cortices? Olfactory bulbs? Keyword "allometry". Yes, brains are plastic, but that doesn't mean orcas are actually ever doing higher mathematics with their brains.
See this comment.
- Scale matters, but I doubt it's very close to being the only thing! Humans likely had genetic adaptations for neuroanatomical phenotypes selected-for by some of: language; tool-making; persisting transient mental content; intent-inference; intent-sharing; mental simulation; prey prediction; deception; social learning; teaching; niche construction/expansion/migration. Orcas have a few of these. But how many, how much, for how long, in what range of situations and manifestations?
I already considered this. (I just posted a question about this.) I don't have good information on to what extent orcas have those, but my guesses are already reflected in my overall guess in the post.
Why do you think orcas have few of those? For me it seems plausibe that orcas have everything except tool use and niche construction.
I do think there was some significant selection for some kind of intelligence in dolphins and orcas - the main question here is whether being optimized on tool use (IF that was a significant driver in what selected humans for intelligence) would be significantly more useful for having the brain potential generalize to doing science than if the brains were optimized because of social dynamics or hunting strategies.
But of course there are other considerations like "maybe you need fully recursive language to be able to have the abstract reasoning take off, and this might very well come from some adaptations that are not just about neoron counts, and maybe orcas don't have that".
I already took all my current uncertain consideration on this into account when I said "50% that they would be superhuman at science if they had similar quality and quantity of science education as scientists and were motivated for this".
Or do you think a cow brain scaled to 40 billion neurons would be superhuman?
I don't know what you're asking for here. I don't think organisms end up with 40 billion cortical neurons without either some strong selection for at least some sub-dimensions of intelligence, or being as big as Godzilla.
I'm not really excited about just smushing together more brain tissue without it having been optimized to work well together, but orca brains were optimized.
- Culture matters. The Greeks could be great philosophers... But could a kid living in 8000 BCE, who gets to text message with an advanced alien civilization of kinda dumb people, become a cutting edge philosopher in the alien culture? Even though almost everyone ze interacts with is preagricultural, preliterate? I dunno, maybe? Still seems kinda hard actually?
Yep that's why I'm only at like 15% that we get very significant results out of it in the next 30 years even if we tried hard. (aka 30% conditional on orcas being smart enough.)
- Superbabies is good. It would actually work. It's not actually that hard. There's lots of investment already in component science/tech. Orcas doesn't scale. No one cares about orcas. There's not hundreds of scientists and hundreds of millions in orca communications research. Etc. The sense of this plan being weird is a good sense to investigate further. It's possible for superficial weirdness to be wrong, but don't dismiss the weirdness out of hand.
I mean if Orcas are smarter they might be super vastly smarter so you wouldn't need that many.
Superbabies would work well given multiple generations but also only like 30% that we'd get +7std humans born within 10 years even if we tried similarly hard[1], and I think it's pretty unlikely we have more than 40 years left without strong governance success. (E.g. afaik we still have problems cloning primates well (even though it's been a thing for long) and those are just sub-difficulties[2] of e.g. creating superbabies through repeated embryo selection.)
I think number of neurons in neocortex (or even more prefrontal cortex - but unfortunately i didn't quickly find how big the orca prefrontal cortex is - though I'd guess it to still be significantly bigger than for humans) is a much much better proxy for intelligence of species than brain size (or encephalization quotient). (E.g. see the wikipedia list linked in my question here.)
(Also see here. There are more examples, e.g. a Blue and yellow macaw has 1.9 billion, whereas brown bears have only 250million.)
EDIT: Tbc I do think that larger bodies require more neurons in touch-sense and motor parts of the neocortex, so there is some effect of how larger animals need a bit larger brains to be similarly smart, but I don't think this effect is very strong.
But yeah there are other considerations too, which is why I am only at 50% that orcas could do science significantly better than humans if they tried.
I feel like life-force seems like a sensation that's different from what I'd expect from just having a thing in the world model with inherent surprisingness and ends-without-trajectory-predictions/"optimizerness" attached. ("Life-force" sounds more like "as if the thing had a soul" to me. I do not understand where this comes from but I don't see how I'd predict such a sensation in advance given just the inherent-surprisingness + optimizerness hypothesis.)
Thanks for communicating your model well again!
I think we might mostly agree, but let's clarify.
I agree with all of:
In the course of predicting them well, the world-model invents some slightly-higher-level concept (or family of closely-interlinked concepts) that we call “cold”. And it notices and memorizes predictively-useful relationships between this new “cold” concept and other things in the world-model, e.g. shivering and ice.
I don’t think there’s more to the concept “cold” than the sum total of its associations with every other concept, with sensory input, and with motor output.
I also basically agree with:
I like to draw the distinction between understanding learning algorithms and understanding trained models. The former is kinda like what you learn in an ML course (gradient descent, training data, etc.) , the latter is kinda like what you learn in a mechanistic interpretability paper. I don’t think it’s realistic to “write code” for the “cold” concept, because I think it (like all concepts) emerges at the trained model level. It emerges from a learning algorithm, training environment, loss function, etc.
I agree that fully writing code would be quite a daunting task. I think my phrasing of "write code" was not great. But it's already some reductionist progress if you have something like:
if coldness concept gets more activated: increase activation of shivering anticipation; weakly increase activation of snow concept; ...
I don't think it's a worthwhile exercise to get very precise.
An important point I wanted to make here is just that the meaning of "cold" comes from the interactions with other concepts, and there's no such thing as an inherent independent meaning of the word "cold". (So when I hear 'If we look at naturalistic visual inputs that directly or indirectly trigger C, and they’re disproportionately pictures of clocks, then that’s some evidence that C “means” clock.' this seems a bit off to me, though not too bad.)
I guess I best try to explain why I felt some unease with your initial description of the cold example:
Suppose somebody said:
There’s a certain kind of interoceptive sensory input, consisting of such-and-such signal coming from blah type of thermoreceptor in the peripheral nervous system. Your brain does its usual thing of transforming that sensation into its own “color” of “metaphysical paint” (as in §3.3.2) that forms a concept / property in your conscious awareness and world-model, and you know it by the everyday term “cold”.
On the one hand, I would defend this passage as basically true.
Basically I think that some people - though a priory not you - would think that sth like "i feel cold because the cold-thermorecepters activate the corresponding cold concept" explains their sense of cold. However, if you just take this hypothesis which basically is "some sensors activate some concept" without anything else, then the concept would be completely shapeless and uninterpretable - unrelated to anything known.
I now think you probably didn't mean it in a nearly that bad way but not sure.
(But some parts of what you write seem to me like you have slightly weaker sensors about "how does a hypothesis actually constrain my anticipations / concentrate probability mass" or "what would this hypothesis predict if I didn't already know how I perceive it", and I do think those sensors are useful.)
(I also think that there is some hypothalamus-or-so buisness logic for what responses to trigger (e.g. shivers) from significant cold input signals that would need to be figured out if you want to get a good model of freezing/feeling-uncomfortably-cold, but that's about freezing in particular and not temperature as a property we model on objects.)
The post is not joking. (But thanks for feedback that you were confused in this way!)
I basically didn't know much about orcas before I learned that they have 2.05 times 3 days ago as many neurons in the neocortex than humans, and then yesterday and the day before I spent looking into how smart orcas might be and evaluating the evidence. So I'm far from an expert but I also didn't run over strong evidence that they are dumber than hunter-gatherer humans and some weak-medium strong evidence that they are might at least as smart as 15 year olds. But it's still possible that orca researchers have observed orcas not finding some strategies they would've thought of or sth.
But yeah the only piece of evidence that they might be significantly smarter than humans is their brains. I consider it reasonably strong though.
I edited "Though I don't know that much about orcas" to "Though I only tried to form a model of orca intelligence for 1-2 days". thanks!
Hi Steve, I didn't read this post yet and just wanted to ask whether it's still worth reading or whether everything relevant is now better in "incentive learning and dead sea salt experiment"?
In the wikipedia list, the estimated number of neurons in the neocortex of a blue whale is 5 billion (compared to 43 billion in orcas), even though blue whales are much larger. (Unfortunately the blue whale estimate is just an estimate and not grounded in optical or isotropic fractionation measurements.)
(EDIT: Hm interesting, the linked reddit post mentions 15billion for blue whales. Not sure what is correct.)
I agree that memory and beliefs are in some sense optional addons. I don't understand precisely enough yet how we model animals.
On your section on cold:
First, I'm still not sure in what way you're using "cold" of the two interpretations I indicated here: "(where I assume you talk about mental-physiological reactions to freezing/feeling-cold (as opposed to modelling the temperature of objects))".
But in either case I mostly just mean that having a full reductionist explanation of e.g. cold is an extremely high standard that ought to fulfill the following criteria:
- You can replace the word "cold" and other related abstract words with some other token-sequences/made-up-words, and someone who had a sufficiently good understanding would still be able to figure out that the new made-up-word corresponds to the concept we call "cold".
- (Where I don't think your explanation had something in it where you couldn't just replace "cold" with "heat" or "redness" (except redness wouldn't work if we allow "thermoreceptor" but I'd also want to rename this to "receptor-type-abc".)
- You can sorta write code for a relevant part of what's happening in the mind when e.g. the freezing emotion/sensation is triggered.
- (Like you would not need to describe a fully conscious program, but the function that triggers how muscles contract and the sensation of wanting to curl up and the skin shivering and causes a negative hedonic tone as well as instantiating a subgoal of getting thermoreceptors to report higher temperature or sth. Like I'd count this description as a weak reductionist hypothesis (which makes progress on unpacking the "cold" concept but where there are more levels of unpacking to do), though it might be very incomplete and partially wrong.)
Like I'm not sure we disagree much here. I think everything you said is correct, but I feel like emphasizing that there are still more layers of understanding that need to get unpacked and that saying "it's a concept that's useful to predict sensory data" still leaves up open questions of what exactly the information is the concept has the ability to communicate or of how the concept relates to other concepts.
Thanks for being so wonderfully precise to make it easy for me to reply!
The part where you loose me is here:
Meanwhile, in our everyday experience, we all have an intuitive sense of animation / agency.
Where does this sense of agency come from? Likewise:
When we do this kind of analysis well, we’ll wind up describing every aspect of our actual everyday intuitions around animation / agency / alive-ness, and predicting all the items in §3.3.
How do we get from something seeming inherently surprising to something seeming agentic or embued with life-force?
EDITED TO ADD: Tbc I think you can explain agency (though not life-force, and you need to be carefuly to only interpret agency in this limited sense) through being able to predict outcomes without trajectories (as you also seem to have realized, as in "(derived from a pattern where I can make medium-term predictions despite short-term surprise)"). I wouldn't equate agency with inherent surprisingness though, although it often occurs together.
Hm interesting. I mean I'd imagine that if we get good heuristic guarantees for a system it would basically mean that all the not-perfectly-aligned subsystems/subsearches are limited and contained enough that they won't be able to engage in RSI. But maybe I misunderstand your point? (Like maybe you have specific reason to believe that it would be very hard to predict reliably that a subsystem is contained enough to not engage in RSI or so?)
(I think inner alignment is very hard and humans are currently not (nearly?) competent enough to figure out how to set up training setups within two decades. Like for being able to get good heuristic guarantees I think we'd need to at least figure out at least something sorta like the steering subsystem which tries to align the human brain, only better because it's not good enough for smart humans I'd say. (Though Steven Byrnes' agenda is perhaps a UANFSI approach that might have sorta a shot because it might open up possibilities of studying in more detail how values form in humans. Though it's a central example of what I was imagining when I coined the term.))
How bottlenecked is your agenda by philosophy skills (like being good at thought experiments for deriving stuff like UDT, or like being good at figuring out the right ontology for thinking about systems or problems) vs math skill vs other stuff?
Idk that could be part of finding heuristic arguments for desireable properties for what an UANFSI converges to. Possibly it's easier to provide probabilistic convergence guarantees for systems that don't do FSI so this would already give some implicit evidence. But we could also just say that it's fine if FSI happens as long as we have heuristic convergence arguments - like that UANFSI is just allowing for a broader class of algorithms which might make stuff easier - though i mostly don't expect we'd get FSI alignment through this indirect alignment path from UANFSI but that we'd get an NFSI AI if we get some probabilistic convergence guarantees.
(Also I didn't think much about it at all. As said I'm trying KANSI for now.)
(I don't fully understand yet what results your aiming for, but yeah makes sense that probabilistic guarantees make some stuff more feasible. Not sure whether there might be more relaxations I'd be fine to at least initially make.)