Posts
Comments
I hesitated between Koyaanisqatsi and Baraka! Both are some of my favorites, but in my view Koyaanisqatsi actually has notably more of an agenda and a more pessimistic outlook.
Baraka: A guided meditation exploring the human experience; topics like order/chaos, modernity, green vs. other mtg colours.
More than "connected to something in sequences" it is connected to something which straw sequence-style rationality is prone to miss. Writings it has more resonance with are Meditations on Moloch, The Goddess of Everything Else, The Precipice.
There isn't much to spoil: it's 97m long nonverbal documentary. I would highly recommend to watch on as large screen in as good quality you can, watching it on small laptop screen is a waste.
Central european experience, which is unfortunately becoming relevant also for the current US: for world-modelling purposes, you should have hypotheses like 'this thing is happening because of a russian intelligence operation' or 'this person is saying what they are saying because they are a russian asset' in your prior with nontrivial weights.
I expected quite different argument for empathy
1. argument from simulation: most important part of our environment are other people; people are very complex and hard to predict; fortunately, we have a hardware which is extremely good at 'simulating a human' - our individual brains. to guess what other person will do or why they are doing what they are doing, it seems clearly computationally efficient to just simulate their cognition on my brain. fortunately for empathy, simulations activate some of the same proprioceptive machinery and goal-modeling subagents, so the simulation leads to similar feelings
2. mirror neurons: it seems we have powerful dedicated system for imitation learning, which is extremely advantageous for overcoming genetic bottleneck. mirroring activation patterns leads to empathy
My personal impression is you are mistaken and the innovation have not stopped, but part of the conversation moved elsewhere. E.g. taking just ACS, we do have ideas from past 12 months which in our ideal world would fit into this type of glossary - free energy equilibria, levels of sharpness, convergent abstractions, gradual disempowerment risks. Personally I don't feel it is high priority to write them for LW, because they don't fit into the current zeitgeist of the site, which seems directing a lot of attention mostly to:
- advocacy
- topics a large crowd cares about (e.g. mech interpretability)
- or topics some prolific and good writer cares about (e.g. people will read posts by John Wentworth)
Hot take, but the community loosely associated with active inference is currently better place to think about agent foundations; workshops on topics like 'pluralistic alignment' or 'collective intelligence' have in total more interesting new ideas about what was traditionally understood as alignment; parts of AI safety went totally ML-mainstream, with the fastest conversation happening at x.
Seems worth mentioning SOTA, which is https://futuresearch.ai/. Based on the competence & epistemics of Futuresearch team and their bot get very strong but not superhuman performance, roll to disbelieve this demo is actually way better and predicts future events at superhuman level.
Also I think it is a generally bad to not mention or compare to SOTA but just cite your own prior work. Shame.
I'm skeptical of the 'wasting my time' argument.
Stance like 'going to poster sessions is great for young researchers, I don't do it anymore and just meet friends' is high-status, so, on priors, I would expect people to take it more than what's optimal.
Realistically, poster session is ~1.5h, maybe 2h with skimming what to look at. It is relatively common for people in AI to spend many hours per week digesting what are the news on twitter. I really doubt the per hour efficiency of following twitter is better than of poster sessions when approached intentionally. (While obviously aimlessly wandering between endless rows of posters is approximately useless.)
Corrected!
I broadly agree with this - we tried to describe somewhat similar set of predictions in Cyborg periods.
Surprised you haven't heard about any facilitated communication tools.
Few thoughts
- actually, these considerations mostly increase uncertainty and variance about timelines; if LLMs miss some magic sauce, it is possible smaller systems with the magic sauce could be competitive, and we can get really powerful systems sooner than Leopold's lines predict
- my take on what is one important thing which makes current LLMs different from humans is the gap described in Why Simulator AIs want to be Active Inference AIs; while that post intentionally avoids having a detailed scenario part, I think the ontology introduced is better for thinking about this than scaffolding
- not sure if this is clear to everyone, but I would expect the discussion of unhobbling being one of the places where Leopold would need to stay vague to not breach OpenAI confidentiality agreements; for example, if OpenAI was putting a lot of effort into make LLM-like systems be better at agency, I would expect he would not describe specific research and engineering bets
Agreed we would have to talk more. I think I mostly get the homunculi objection. Don't have time now to write an actual response, so here are some signposts:
- part of what you call agency is explained by roughly active inference style of reasoning
-- some type of "living" system is characteristic by having boundaries between them and the environment (boundaries mostly in sense of separation of variables)
-- maintaining the boundary leads to need to model the environment
-- modelling the environment introduces a selection pressure toward approximating Bayes
- other critical ingredient is boundedness
-- in this universe, negentropy isn't free
-- this introduces fundamental tradeoff / selection pressure for any cognitive system: length isn't free, bitflips aren't free, etc.
(--- downstream of that is compression everywhere, abstractions)
-- empirically, the cost/returns function for scaling cognition usually hits diminishing returns, leading to minds where it's not effective to grow the single mind further
--- this leads to the basin of convergent evolution I call "specialize and trade"
-- empirically, for many cognitive systems, there is a general selection pressure toward modularity
--- I don't know what are all the reasons for that, but one relatively simple is 'wires are not free'; if wires are not free, you get colocation of computations like brain regions or industry hubs
--- other possibilities are selection pressures from CAP theorem, MVG, ...
(modularity also looks a bit like box-inverted specialize and trade)
So, in short, I think where I agree with the spirit of If humans didn't have a fixed skull size, you wouldn't get civilization with specialized members and my response is there seems to be extremely general selection pressure in this direction. If cells were able to just grow in size and it was efficient, you wouldn't get multicellulars. If code bases were able to just grow in size and it was efficient, I wouldn't get a myriad of packages on my laptop, it would all be just kernel. (But even if it was just kernel, it seems modularity would kick in and you still get the 'distinguishable parts' structure.)
That's why solving hierarchical agency is likely necessary for success
(crossposted from twitter) Main thoughts:
1. Maps pull the territory
2. Beware what maps you summon
Leopold Aschenbrenners series of essays is a fascinating read: there is a ton of locally valid observations and arguments. Lot of the content is the type of stuff mostly discussed in private. Many of the high-level observations are correct.
At the same time, my overall impression is the set of maps sketched pulls toward existential catastrophe, and this is true not only for the 'this is how things can go wrong' part, but also for the 'this is how we solve things' part. Leopold is likely aware of the this angle of criticism, and deflects it with 'this is just realism' and 'I don't wish things were like this, but they most likely are'. I basically don't buy that claim.
You may be interested in 'The self-unalignment problem' for some theorizing https://www.lesswrong.com/posts/9GyniEBaN3YYTqZXn/the-self-unalignment-problem
Mendel's Laws seem counterfactual by about ˜30 years, based on partial re-discovery taking that much time. His experiments are technically something which someone could have done basically any time in last few thousand years, having basic maths
I do agree the argument "We're just training AIs to imitate human text, right, so that process can't make them get any smarter than the text they're imitating, right? So AIs shouldn't learn abilities that humans don't have; because why would you need those abilities to learn to imitate humans?" is wrong and clearly the answer is "Nope".
At the same time I do not think parts of your argument in the post are locally valid or good justification for the claim.
Correct and locally valid argument why GPTs are not capped by human level was already written here.
In a very compressed form, you can just imagine GPTs have text as their "sensory inputs" generated by the entire universe, similarly to you having your sensory inputs generated by the entire universe. Neither human intelligence nor GPTs are constrained by the complexity of the task (also: in the abstract, it's the same task). Because of that, "task difficulty" is not a promising way how to compare these systems, and it is necessary to look into actual cognitive architectures and bounds.
With the last paragraph, I'm somewhat confused by what you mean by "tasks humans evolved to solve". Does e.g. sending humans to the Moon, or detecting Higgs boson, count as a "task humans evolved to solve" or not?
I sort of want to flag this interpretation of whatever gossip you heard seems misleading/only telling small part of the story, based on my understanding.
I would imagine I would also react to it with smile in the context of an informal call. When used as brand / "fill interest form here" I just think it's not a good name, even if I am sympathetic to proposals to create more places to do big picture thinking about future.
Sorry, but I don't think this should be branded as "FHI of the West".
I don't think you personally or Lightcone share that much of an intellectual taste with FHI or Nick Bostrom - Lightcone seems firmly in the intellectual tradition of Berkeley, shaped by orgs like MIRI and CFAR. This tradition was often close to FHI thoughts, but also quite often at tension with it. My hot take is you particularly miss part of the generators of the taste which made FHI different from Berkeley. I sort of dislike the "FHI" brand being used in this way.
edit: To be clear I'm strongly in favour of creating more places for FHI-style thinking, just object to the branding / "let's create new FHI" frame. Owen expressed some of the reasons better and more in depth
You are exactly right that active inference models who behave in self-interest or any coherently goal-directed way must have something like an optimism bias.
My guess about what happens in animals and to some extent humans: part of the 'sensory inputs' are interoceptive, tracking internal body variables like temperature, glucose levels, hormone levels, etc. Evolution already built a ton of 'control theory type cirquits' on the bodies (an extremely impressive optimization task is even how to build a body from a single cell...). This evolutionary older circuitry likely encodes a lot about what the evolution 'hopes for' in terms of what states the body will occupy. Subsequently, when building predictive/innocent models and turning them into active inference, my guess a lot of the specification is done by 'fixing priors' of interoceptive inputs on values like 'not being hungry'. The later learned structures than also become a mix between beliefs and goals: e.g. the fixed prior on my body temperature during my lifetime leads to a model where I get 'prior' about wearing a waterproof jacket when it rains, which becomes something between an optimistic belief and 'preference'. (This retrodicts a lot of human biases could be explained as "beliefs" somewhere between "how things are" and "how it would be nice if they were")
But this suggests an approach to aligning embedded simulator-like models: Induce an optimism bias such that the model believes everything will turn out fine (according to our true values)
My current guess is any approach to alignment which will actually lead to good outcomes must include some features suggested by active inference. E.g. active inference suggests something like 'aligned' agent which is trying to help me likely 'cares' about my 'predictions' coming true, and has some 'fixed priors' about me liking the results. Which gives me something avoiding both 'my wishes were satisfied, but in bizarre goodharted ways' and 'this can do more than I can'
- Too much value and too positive feedback on legibility. Replacing smart illegible computations with dumb legible stuff
- Failing to develop actual rationality and focusing on cultivation of the rationalist memeplex or rationalist culture instead
- Not understanding the problems with the theoretical foundations on which sequences are based (confused formal understanding of humans -> confused advice)
+1 on the sequence being on the best things in 2022.
You may enjoy additional/somewhat different take on this from population/evolutionary biology (and here). (To translate the map you can think about yourself as the population of myselves. Or, in the opposite direction, from a gene-centric perspective it obviously makes sense to think about the population as a population of selves)
Part of the irony here is evolution landed on the broadly sensible solution (geometric rationality). Hower, after almost every human doing the theory got somewhat confused by the additive linear EV rationality maths, what most animals and also often humans on S1 level do got interpreted as 'cognitive bias' - in the spirit of assuming obviously stupid evolution not being able to figure out linear argmax over utility algorithms in a a few billion years.
I guess not much engagement is caused by
- the relation between 'additive' vs 'multiplicative' picture being deceptively simple in formal way
- the conceptual understanding of what's going on and why being quite tricky; one reason is I guess our S1 / brain hardware runs almost entirely in the multiplicative / log world; people train their S2 understanding on linear additive picture; as Scott explains, maths formalism fails us
This is a short self-review, but with a bit of distance, I think understanding 'limits to legibility' is one of the maybe top 5 things an aspiring rationalist should deeply understand and lack of this leads to many bad outcomes in both rationalist and EA communities.
In a very brief form, maybe the most common cause of EA problem and stupidities are attempts to replace illegible S1 boxes able to represent human values such as 'caring' by legible, symbolically described, verbal moral reasoning subject to memetic pressure.
Maybe the most common cause of rationalist problems and difficulties with coordination are cases where people replace illegible smart S1 computations with legible S2 arguments.
In my personal view, 'Shard theory of human values' illustrates both the upsides and pathologies of the local epistemic community.
The upsides
- majority of the claims is true or at least approximately true
- "shard theory" as a social phenomenon reached critical mass making the ideas visible to the broader alignment community, which works e.g. by talking about them in person, votes on LW, series of posts,...
- shard theory coined a number of locally memetically fit names or phrases, such as 'shards'
- part of the success leads at some people in the AGI labs to think about mathematical structures of human values, which is an important problem
The downsides
- almost none of the claims which are true are original; most of this was described elsewhere before, mainly in the active inference/predictive processing literature, or thinking about multi-agent mind models
- the claims which are novel seem usually somewhat confused (eg human values are inaccessible to the genome or naive RL intuitions)
- the novel terminology is incompatible with existing research literature, making it difficult for alignment community to find or understand existing research, and making it difficult for people from other backgrounds to contribute (while this is not the best option for advancement of understanding, paradoxically, this may be positively reinforced in the local environment, as you get more credit for reinventing stuff under new names than pointing to relevant existing research)
Overall, 'shards' become so popular that reading at least the basics is probably necessary to understand what many people are talking about.
My current view is this post is decent at explaining something which is "2nd type of obvious" in a limited space, using a physics metaphor. What is there to see is basically given in the title: you can get a nuanced understanding of the relations between deontology, virtue ethics and consequentialism using the frame of "effective theory" originating in physics, and using "bounded rationality" from econ.
There are many other ways how to get this: for example, you can read hundreds of pages of moral philosophy, or do a degree in it. Advantage of this text is you can take a shortcut and get the same using the physics metaphorical map. The disadvantage is understanding how effective theories work in physics is a prerequisite, which quite constrains the range of people to which this is useful, and the broad appeal.
This is a great complement to Eliezer's 'List of lethalities' in particular because in cases of disagreements beliefs of most people working on the problem were and still mostly are are closer to this post. Paul writing it provided a clear, well written reference point, and with many others expressing their views in comments and other posts, helped made the beliefs in AI safety more transparent.
I still occasionally reference this post when talking to people who after reading a bit about the debate e.g. on social media first form oversimplified model of the debate in which there is some unified 'safety' camp vs. 'optimists'.
Also I think this demonstrates that 'just stating your beliefs' in moderately-dimensional projection could be useful type of post, even without much justification.
The post is influential, but makes multiple somewhat confused claims and led many people to become confused.
The central confusion stems from the fact that genetic evolution already created a lot of control circuitry before inventing cortex, and did the obvious thing to 'align' the evolutionary newer areas: bind them to the old circuitry via interoceptive inputs. By this mechanism, genome is able to 'access' a lot of evolutionary relevant beliefs and mental models. The trick is the higher/more distant to genome models are learned in part to predict interoceptive inputs (tracking evolutionary older reward circuitry), so they are bound by default, and there isn't much independent to 'bind'. Anyone can check this... just thinking about a dangerous looking person with a weapon activates older, body-based fear/fight chemical regulatory circuits => the active inference machinery learned this and plans actions to avoid these states.
Speculative guess about the semantic richness: the embeddings at distances like 5-10 are typical to concepts which are usually represented by multi token strings. E.g. "spotted salamander" is 5 tokens.
I like the agree-disagree vote and the design.
With the content and votes...
- my impression is until ~1-2 years ago LW had a decent share of great content; I disliked the average voting "taste vector", which IMO represented somewhat confused taste in roughly "dumbed down MIRI views" direction. I liked many of the discourse norms
- not sure what exactly happened, but my impression is LW is often just another battlefield in 'magical egregore war zone'. (It's still way better than other online public spaces)
What I mean by that is a lot of people seemingly moved from 'let's figure out how things are' into 'texts you write are elaborate battle moves in egregore warfare''. Don't feel excited about pointing to examples, but impression are ...growing share of senior top-ranking users who seem hard to convince about anything, can not be bothered to actually engage with arguments, writing either literal manifestos or in manifesto-style.
(high-level comment)
To me, it seems this dialogue diverged a lot into a question of what is self-referential, how important that is, etc. I don't think that's The core idea of complex systems, and does not seem a crux for anything in particular.
So, what are core ideas of complex systems? In my view:
1. Understanding that there is this other direction (complexity) physics can expand to; traditionally, physics has expanded in scales of space, time, and energy - starting from everyday scales of meters, seconds, and kgs, gradually understanding the world on more and more distant scales.
While this was super successful, with a careful look, you notice that while we had claims like 'we now understand deeply how the basic building blocks of matter behave', this comes with a * disclaimer/footnote like 'does not mean we can predict anything if there are more of the blocks and they interact in nontrivial ways'.
This points to some other direction in the space of stuff to apply physics way of thinking than 'smaller', 'larger', 'high energy', etc., and also different than 'applied'.
Accordingly, good complex systems science is often basically the physics way of thinking applied to complex systems. Parts of statistical mechanics fit neatly into this, but because being developed first, have somewhat specific brand.
Why it isn't done just under the brand of 'physics' seems based on, in my view, often problematic way of classifying fields by subject of study, and not by methods. I know some personal experiences of people who tried to do, e.g., physics of some phenomena in economic systems, having a hard time to survive in traditionally physics academic environments ("does it really belong here if instead of electrons you are now applying it to some ...markets?")
(This is not really strict; for example, decent complex systems research is often published in venues like Physica A, which is nominally about Statistical Mechanics and its Applications)
2. 'Physics' in this direction often stumbled upon pieces of math that are broadly applicable in many different contexts. (This is actually pretty similar to the rest of physics, where, for example, once you have the math of derivatives, or math of groups, you see them everywhere.) The historically most useful pieces are e.g., math of networks, statistical mechanics, renormalization, parts of entropy/information theory, phase transitions,...
Because of the above-mentioned (1.), it's really not possible to show 'how is this a distinct contribution of complex systems science, in contrast to just doing physics of nontraditional systems'. Actually, if you look at the 'poster children' of some of the 'complex systems science'... my maximum likelihood estimate about their background is physics. (Just googled authors of the mentioned book: Stefan Thurner... obtained a PhD in theoretical physics, worked on e.g., topological excitations in quantum field theories, statistics and entropy of complex systems. Petr Klimek... was awarded a PhD in physics. Albert-László Barabási... has a PhD in physics. Doyne Farmer... University of California, Santa Cruz, where he studied physical cosmology etc. etc.). Empirically they prefer the brand of complex systems vs. just physics.
3. Part of what distinguishes complex systems [science / physics / whatever ... ] is in aesthetics. (Also here it becomes directly relevant to alignment).
A lot of traditional physics and maths basically has a distaste toward working on problems which are complex, too much in the direction of practical relevance, too much driven by what actually matters.
Mentioned Albert-László Barabási got famous for investigating properties of real-world networks, like the internet or transport networks. Many physicists would just not work on this because it's clearly 'computer science' or something, as the subject are computers or something like that. Discrete maths people studying graphs could have discovered the same ideas a decade earlier ... but my inner sim of them says studying the internet is distasteful. It's just one graph, not some neatly defined class of abstract objects. It's data-driven. There likely aren't any neat theorems. Etc.
Complex systems has an opposite aesthetics: applying math to real-world matters. Important real-world systems are worth studying also because of real-world importance, not just math beauty.
In my view AI safety would be a on a better track if this taste/aesthetics was more common. What we have now often either lacks what's good about physics (aim for somewhat deep theories which generalize) or lacks what's good about complexity science branch of physics (reality orientation, assumption that you often find cool math when looking at reality carefully vs. when just looking for cool maths)
These are especially common, surprisingly perhaps, in AI and ML departments.
This is somewhat unsurprising given human psychology.
- Scaling up LLMs killed a lot of research agendas inside ML, particularly NLP. Imagine your whole research career was built on improving benchmarks on some NLP problem using various clever ideas. Now, the whole thing is better solved by three sentence prompt to GPT4 and everything everyone in the subfield worked on is irrelevant for all practical purposes... how do you feel? In love with scaled LLMs?
- Overall, people often like about research is coming up with smart ideas, and there is some aesthetics going into it. What's traditionally not part of the aesthetics is 'and you also need to get $100M in compute', and it's reasonably to model a lot of people as having a part which hates this.
Part of ACS research directions fits into this - Hierarchical Agency, Active Inference based pointers to what alignmnent means, Self-unalignment
The simple math is active inference, and the type is almost entirely the same as 'beliefs'.
My impression is you get a lot of "the later" if you run "the former" on the domain of language and symbolic reasoning, and often the underlying model is still S1-type. E.g.
rights inherent & inalienable, among which are the preservation of life, & liberty, & the pursuit of happiness
does not sound to me like someone did a ton of abstract reasoning to systematize other abstract values, but more like someone succeeded to write words which resonate with the "the former".
Also, I'm not sure why do you think the later is more important for the connection to AI. Curent ML seem more similar to "the former", informal, intuitive, fuzzy reasonining.
Re self-unalignment: that framing feels a bit too abstract for me; I don't really know what it would mean, concretely, to be "self-aligned". I do know what it would mean for a human to systematize their values—but as I argue above, it's neither desirable to fully systematize them nor to fully conserve them.
That's interesting - in contrast, I have a pretty clear intuitive sense of a direction where some people have a lot of internal conflict and as a result their actions are less coherent, and some people have less of that.
In contrast I think in case of humans who you would likely describe as 'having systematized there values' ... I often doubt what's going on. A lot people who describe themselves as hardcore utilitarians seem to be ... actually not that, but more resemble a system where somewhat confused verbal part fights with other parts, which are sometimes suppressed.
Identifying whether there's a "correct" amount of systematization to do feels like it will require a theory of cognition and morality that we don't yet have.
That's where I think looking at what human brains are doing seems interesting. Even if you believe the low-level / "the former" is not what's going with human theories of morality, the technical problem seems very similar and the same math possibly applies
"Systematization" seems like either a special case of the Self-unalignment problem.
In humans, it seems the post is somewhat missing what's going on. Humans are running something like this
...there isn't any special systematization and concretization process. All the time, there are models running at different levels of the hierarchy, and every layer tries to balance between prediction errors from more concrete layers, and prediction errors from more abstract layers.
How does this relate to "values" ... from low-level sensory experience of cold, and fixed prior about body temperature, the AIF system learns more abstract and general "goal-belief" about the need to stay warm, and more abstract sub-goals about clothing, etc. At the end there is a hierarchy of increasingly abstract "goal-beliefs" what I do, expressed relative to the world model.
What's worth to study here is how human brains manage to keep the hierarchy mostly stable
Absent symbolic language, none of these are capable of transmitting significant general purpose world knowledge, and thus are irrelevant for the techno-cultural criticality.
It's likely literally not true, but if it was ... this proves my point, doesn't it?
"Symbolic language" is exactly the type of innovation which can be discontinuous, has a type "code" more than "data quantity", and unlocks many other things. For example more rapid and robust horizontal synchronization of brains (eg when hunting). Or yes, jump in effective quantity of information transmitted via other signals in time.
At the same time ...could be clearly discontinuous: you can teach actual apes sign language, and it seems plausible this would make them more fit, if done in the wild.
(It's actually somewhat funny that Eric Drexler has a hundred page report based exactly on the premise "AI models using human language is obviously stupid inefficiency, and you can make a jump in efficiency with more native-architecture-friendly format".
This does not seem obviously stupid: e.g, now, if you want one model to transfer some implicit knowledge it learned, the way to do it is use the ML-native model to generate shitload of human natural language examples, and train the other model on it, building the native representation again.)
I'll try to keep it short
All the cross-generational information channels you highlight are at rough saturation, so they're not able to contribute to the cross-generational accumulation of capabilities-promoting information.
This seems clearly contradicted by empirical evidence. Mirror neurons would likely be able to saturate what you assume is brains learning rate, so not transferring more learned bits is much more likely because marginal cost of doing so is higher than than other sensible options. Which is a different reason than "saturated, at capacity".
Firstly, I disagree with your statement that other species have "potentially unbounded ways how to transmit arbitrary number of bits". Taken literally, of course there's no species on earth that can actually transmit an *unlimited* amount of cultural information between generations
Sure. Taken literally, the statement is obviously false ... literally nothing can store arbitrary number of bits because of Bekenstein bound. More precisely, the claim is existing non-human ways how to transmit leaned bits to the next generation in practice do not seem to be constrained by limits how many bits they can transmit, but by some other limits (e.g. you can transmit more bits than the capacity of the animal to learn).
Secondly, the main point of my article was not to determine why humans, in particular, are exceptional in this regard. The main point was to connect the rapid increase in human capabilities relative to previous evolution-driven progress rates with the greater optimization power of brains as compared to evolution. Being so much better at transmitting cultural information as compared to other species allowed humans to undergo a "data-driven singularity" relative to evolution. While our individual brains and learning processes might not have changed much between us and ancestral humans, the volume and quality of data available for training future generations did increase massively, since past generations were much better able to distill the results of their lifetime learning into higher-quality data.
1. As explained in my post, there is no reason to assume ancestral humans were so much better at transmitting information as compared to other species
2. The qualifier they were better at transmitting cultural information may (or may not) do a lot of work.
The crux is something like "what is the type signature of culture". Your original post roughly assumes "it's just more data". But this seems very unclear: a comment above yours, jacob_cannell confidently claims I miss the forest and makes a guess the critical innovation is "symbolic language". But, obviously, "symbolic language" is a very different type of innovation than "more data transmitted across generations".
Symbolic language likely
- allows to use any type of channel more effectively
- in particular, allows more efficient horizontal synchronization, allowing parallel computations across many brains
- overall sounds more like software upgrade
Consider plain old telephone network wires: these have surprisingly large intrinsic capacity, which isn't that effectively used by analog voice calls. Yes, when you plug a modem on both sides you experience "jump" in capacity - but this is much more like "software update" and can be more sudden.
Or a different example - empirically, it seems possible to teach various non-human apes sign language (their general purpose predictive processing brains are general enough to learn this). I would classify this as "software" or "algorithm" upgrade,. If someone did this to a group of apes in the wild, it seems plausible knowledge of language would stick and make them differentially more fit. But teaching apes symbolic language sounds in principle different from "it's just more data" or "it's a higher quality data", and implications for AI progress would be different.
it relies on resource overhand being a *necessary* factor,
My impression is compared to your original post your model drifts to more and more general concepts where it becomes more likely true, harder to refute and less clear what the implication for AI is. What is the "resource" here? Does negentropy stored in wood count as "a resource overhang"?
I'm arguing specifically against a version where "resource overhang" is caused by "exploitable resources you easily unlock by transmitting more bits learned by your brain vertically to your offspring brain" because your map of humans to AI progress is based on quite specific model of what are the bottlenecks and overhangs.
If the current version of the argument is "sudden progress happens exactly when (resource overhang) AND ..." with "generally any kind of resource" then yes, this sounds more likely, but it seems very unclear what does this imply for AI.
(Yes I'm basically not discussing the second half of the article)
I have a longer draft on this, but my current take is the high level answer to the question is similar for crabs and ontologies (&more).
Convergent evolution usually happens because of similar selection pressures + some deeper contingencies.
Looking at the selection pressures for ontologies and abstractions, there is a bunch of pressures which are fairly universal, an in various ways apply to humans, AIs, animals...
For example: Negentropy is costly => flipping less bits and storing less bits is selected for; consequences include
-part of concepts; clustering is compression
-discretization/quantization/coarse grainings; all is compression
...
Intentional stance is to a decent extent ~compression algorithm assuming some systems can be decomposed into "goals" and "executor" (now the cat is chasing a mouse, now some other mouse). Yes this is again not the full explanation because it leads to a question why there are systems in the territory for which this works, but it is a step.
My main answer is capacity constrains at central places. I think you are not considering how small the community was.
One somewhat representative anecdote: sometime in ~2019, at FHI, there was a discussion that the "AI ethics" and "AI safety" research communities seem to be victims of unfortunate polarization dynamics, where even while in the Platonic realm of ideas concerns tracked by the people are compatible, there is somewhat unfortunate social dynamic, where loud voices on both sides are extremely dismissive of the other community. My guess at that time was the divide has decent chance of exploding when AI worries go mainstream (like, arguments about AI risk facing vociferous opposition from part of academia entrenched under the "ethics" flag), and my proposal was to do something about it, as there were some opportunities to pre-empt/heal this, e.g. by supporting people from both camps to visit each others conferences, or writing papers explaining the concerns in a language of the other camp. Overall this was often specific and actionable. The only problem was ... "who has time to work on this", and the answer was "no one".
If you looked at what senior staff at FHI was working on, the counterfactuals were e.g. Toby Ord writing The Precipice. I think even with the benefit of hindsight, that was clearly more valuable - if today you see UN Security Council discussing AI risk and at least some people in the room have somewhat sane models, it's also because a bunch of people at UN read The Precipice and started to think about xrisk and AI risk.
If you looked at junior people, I was juggling already quite high number of balls, including research on active inference minds and implications for value learning, research on technical problems in comprehensive AI services, organizing academic-friendly Human-aligned AI summer school, organizing Epistea summer experiment, organizing ESPR, participating in a bunch of CFAR things. Even in retrospect, I think all of these bets were better than me trying to do something about the expected harmful AI ethics vs AI safety flamewar.
Similarly, we had an early-stage effort on "robust communication", attempting to design a system for testing robustly good public communication about xrisk and similar sensitive topics (including e.g. developing good shareable models of future problems fitting in the Overton window). It went nowhere because ... there just weren't any people. FHI had dozens of topic like that where a whole org should work on them, but the actual attention was about 0.2FTE of someone junior.
Overall I think with the benefit of hindsight, a lot of what FHI worked on was more or less what you suggest should have been done. It's true that this was never in the spotlight on LessWrong - I guess in 2019 the prevailing LW sentiment would be that Toby Ord engaging with UN is most likely useless waste of time.
What were the other options? Have you considered advising xAI privately, or re-directing xAI to be advised by someone else? Also, would the default be clearly worse?
As you surely are quite aware of, one of the bigger fights about AI safety across academia, policymaking and public spaces now is the discussion about AI safety being "distraction" from immediate social harms, and being actually the agenda favoured by the leading labs and technologists. (Often comes with accusations of attempted regulatory capture, worries about concentration of power, etc.)
In my view, given this situation, it seems valuable to have AI safety represented also by somewhat neutral coordination institutions without obvious conflicts of interest and large attack surfaces.
As I wrote in the OP, CAIS made some relatively bold moves to became one of the most visible "public representatives" of AI safety - including the name choice, and organizing the widely reported Statement on AI risk (which was a success). Until now, my impression was when you are taking the namespace, you also aim for CAIS to be such "somewhat neutral coordination institution without obvious conflicts of interest and large attack surfaces".
Maybe I was wrong, and you don't aim for this coordination/representative role. But if you do, advising xAI seems a strange choice for multiple reasons:
1. it makes you somewhat less neutral party for the broader world; even if the link to xAI does not actually influence your judgement or motivations, I think on priors it's broadly sensible for policymakers, politicians and public to suspect all kind of activism, advocacy and lobbying efforts having some side-motivations or conflicts of interest, and this strengthens this suspicion
2. the existing public announcements do not inspire confidence in the safety mindset in xAI founders; it seems unclear whether you advised xAI also about the plan "align to curiosity"
3. if xAI turns to be mostly interested in safety-washing, it's more of problem if it's aided by more central/representative org
Broadly agree the failure mode is important; also I'm fairly confident basically all the listed mentors understand this problem of rationality education / "how to improve yourself" schools / etc. and I'd hope can help participants to avoid it.
I would subtly push back against optimizing for something like being measurably stronger on a timescale like 2 months. In my experience actually functional things in this space typically work by increasing the growth rate of [something hard to measure], so instead of e.g. 15% p.a. you get 80% p.a.
Because his approach does not conform to established epistemic norms on LessWrong, Adrian feels pressure to cloak and obscure how he develops his ideas. One way in which this manifests is his two-step writing process. When Adrian works on LessWrong posts, he first develops ideas through his free-form approach. After that, he heavily edits the structure of the text, adding citations, rationalisations and legible arguments before posting it. If he doesn’t "translate" his writing, rationalists might simply dismiss what he has to say.
cf Limits to legibility ; yes, strong norms/incentives for "legibility" have this negative impact.
I broadly agree with something like "we use a lot of explicit S2 algorithms built on top of the modelling machinery described", so yes, what I mean more directly apply to the low level, than to humans explicitly thinking about what steps to take.
I think practically useful epistemology for humans needs to deal with both "how is it implemented" and "what's the content". To use ML metaphor: human cognition is build out of both "trained neural nets" and "chain-of-thought type inferences in language" running on top of such nets. All S2 reasoning is a prediction in somewhat similar way as all GPT3 reasoning is a prediction - the NN predictor learns how to make "correct predictions" of language, but because the domain itself is partially symbolic world model, this maps to predictions about the world.
In my view some parts of traditional epistemology are confused in trying to do epistemology for humans basically only at the level of the language reasoning, which is a bit like if you try to fix LLM cognition just by writing smart prompts, and ignore there is this huge underlying computation which does the heavy lifting.
I'm certainly in favour of attempts to do epistemology for humans which are compatible with what the underlying computation actually does.
I do agree you can go too far in the opposite direction, ignoring the symbolic reason ... but seems rare when people think about humans?
2. My personal take on dark room problem is it is in case of humans mostly fixed by "fixed priors" on interoceptive inputs. I.e. your body has evolutionary older machinery to compute hunger. This gets fed into the predictive processing machinery as input, and the evolutionary sensible belief ("not hungry") gets fixed. (I don't think calling this "priors" was good choice of terminology...).
This setup at least in theory rewards both prediction and action, and avoids dark room problems for practical purposes: let's assume I have this really strong belief ("fixed prior") I won't be hungry 1 hour in future. Conditional on that, I can compute what are my other sensory inputs half an hour from now. Predictive model of me eating a tasty food in half an hour is more coherent with me being not hungry than predictive model of me reading a book - but this does not need to be hardwired, but can be learned.
Given that evolution has good reasons to "fix priors" on multiple evolutionary relevant inputs, I would not expect actual humans to seek dark rooms, but I would expect the PP system occasionally seeking a way how to block or modify the interoceptive signals
3. My impression about how you use 'frames' is ... the central examples are more like somewhat complex model ensembles including some symbolic/language based components, rather than e.g. "there is gravity" frame or "model of apple" frame. My guess is this will likely be useful for practical use, but with attempts to formalize it, I think a better option is to start with the existing HGM maths.
So far it seems like you are broadly reinventing concepts which are natural and understood in predictive processing and active inference.
Here is rough attempt at translation / pointer to what you are describing: what you call frames is usually called predictive models or hierarchical generative models in PP literature
- Unlike logical propositions, frames can’t be evaluated as discretely true or false.
Sure: predictive models are evaluated based on prediction error, which is roughly a combination of ability to predict outputs of lower level layers, not deviating too much from predictions of higher order models, and being useful for modifying the world. - Unlike Bayesian hypotheses, frames aren’t mutually exclusive, and can overlap with each other. This (along with point Frames in context
Sure: predictive models overlap, and it is somewhat arbitrary where you would draw boundaries of individual models. E.g. you can draw a very broad boundary around a model call microeconomics, and a very broad boundary around a model called Buddhist philosophy, but both models likely share some parts modelling something like human desires - Unlike in critical rationalism, we evaluate frames (partly) in terms of how true they are (based on their predictions) rather than just whether they’ve been falsified or not.
Sure: actually science roughly is "cultural evolution rediscovered active inference". Models are evaluated based on prediction error. - Unlike Garrabrant traders and Rational Inductive Agents, frames can output any combination of empirical content (e.g. predictions about the world) and normative content (e.g. evaluations of outcomes, or recommendations for how to act).
Sure: actually, the "any combination" goes even further. In active inference, there is no strict type difference between predictions about stuff like "what photons hit photoreceptors in your eyes" and stuff like "what should be a position of your muscles". Recommendations how to act are just predictions about your actions conditional of wishful oriented beliefs about future states. Evaluations of outcomes are just prediction errors between wishful models and observations. - Unlike model-based policies, policies composed of frames can’t be decomposed into modules with distinct functions, because each frame plays multiple roles.
Mostly but this description seems a bit confused. "This has distinct function" is a label you slap on a computation using design stance, if the design stance description is much shorter than the alternatives (e.g. physical stance description). In case of hierarchical predictive models, you can imagine drawing various boundaries around various parts of the system (e.g., you can imagine alternatives of including or not including layers computing edge detection in a model tracking whether someone is happy, and in the other direction you can imagine including and not including layers with some abstract conceptions of hedonic utilitarianism vs. some transcendental purpose). Once you select a boundary, you can sometimes assign "distinct function" to it, sometimes more than one, sometimes "distinct goal", etc. It's just a question of how useful are physical/design/intentional stances. - Unlike in multi-agent RL, frames don’t interact independently with their environment, but instead contribute towards choosing the actions of a single agent.
Sure: this is exactly what hierarchical predictive models do in PP. All the time different models are competing for predictions about what will happen, or what will do.
Assuming this more or less shows that what you are talking about is mostly hierarchical generative models from active inference, here are more things the same model predict
a. Hierarchical generative models are the way how people do perception. predictive error is minimized between a stream of prediction from upper layers (containing deep models like "the world has gravity" or "communism is good") and stream of errors from the direction of senses. Given that, what is naively understood as "observations" is ... more complex phenomenon, where e.g. leaf flying sideways is interpreted given strong priors like there is gravity pointing downward, and an atmosphere, and given that, the model predicting "wind is blowing" decreases the sensory prediction error. Similarly, someone being taken into custody by KGB is, under the upstream model of "soviet communism is good" prior, interpreted as the person likely being a traitor. In this case competing broad model "soviet communism is evil totalitarian dictatorship" could actually predict the same person being taken into custody, just interpreting it as the system prosecuting dissidents.
b. It is possible to look at parts of this modelling machinery wearing intentional stance hat. If you do this, the system looks like multi-agent mind, and you can
- derive a bunch of IFC/ICF style of intuitions
- see parts of it as econ interaction or market - the predictive models compete for making predictions, "pay" a complexity cost, are rewarded for making "correct" predictions (correct here meaning minimizing error between the model and the reality, which can include changing the reality, aka pursuing goals)
What's the main difference between naive/straightforward multi-agent mind models is the "parts" live within a generative model, and interact with it and though it, not through the world. They don't have any direct access to reality, and compete at the same time for interpreting sensory inputs and predicting actions.
This seems to be partially based on (common?) misunderstanding of CAIS as making predictions about concentration of AI development/market power. As far as I can tell this wasn't Eric's intention: I specifically remember Eric mentioning he can easily imagine the whole "CAIS" ecosystem living in one floor of DeepMind building.
Thanks for the reply. Also for the work - it's great signatures are added - before I've checked bottom of the list and it seemed it's either same or with very few additions.
I do understand verification of signatures requires some amount of work. In my view having more people (could be volunteers) to process the initial expected surge of signatures fast would have been better; attention spent on this will drop fast.
I feel somewhat frustrated by execution of this initiative. As far as I can tell, no new signatures are getting published since at least one day before the public announcement. This means even if I asked someone famous (at least in some subfield or circles) to sign, and the person signed, their name is not on the list, leading to understandable frustration of them. (I already got a piece of feedback in the direction "the signatories are impressive, but the organization running it seems untrustworthy")
Also if the statement is intended to serve as a beacon, allowing people who have previously been quiet about AI risk to connect with each other, it's essential for signatures to be published. It's nice that Hinton et al. signed, but for many people in academia it would be actually practically useful to know who from their institution signed - it's unlikely that most people will find collaborators in Hinton, Russell or Hassabis.
I feel even more frustrated because this is second time where similar effort is executed by xrisk community while lacking basic operational competence consisting in the ability to accept and verify signatures. So, I make this humble appeal and offer to the organizers of any future public statements collecting signatures: if you are able to write a good statement and secure the endorsement of some initial high-profile signatories, but lack the ability to accept, verify and publish more than a few hundreds names, please reach out to me - it's not that difficult to find volunteers for this work.
I don't think the way you imagine perspective inversion captures typical ways how to arrive at e.g. 20% doom probability. For example, I do believe that there are multiple good things which can happen/be true, decrease p(doom) and I put some weight on them
- we do discover some relatively short description of something like "harmony and kindness"; this works as an alignment target
- enough of morality is convergent
- AI progress helps with human coordination (could be in costly way, eg warning shot)
- it's convergent to massively scale alignment efforts with AI power, and these solve some of the more obvious problems
I would expect prevailing doom conditional on only small efforts to avoid it, but I do think the actual efforts will be substantial, and this moves the chances to ~20-30%. (Also I think most of the risk comes from not being able to deal with complex systems of many AIs and economy decoupling from humans, and single-single alignment to be solved sufficiently to prevent single system takeover by default.)