"Wanting" and "liking" 2023-08-30T14:52:04.571Z
GPTs' ability to keep a secret is weirdly prompt-dependent 2023-07-22T12:21:26.175Z
How do you manage your inputs? 2023-03-28T18:26:36.979Z
Mateusz Bagiński's Shortform 2022-12-26T15:16:17.970Z
Kraków, Poland – ACX Meetups Everywhere 2022 2022-08-24T23:07:07.542Z


Comment by Mateusz Bagiński (mateusz-baginski) on The possible shared Craft of deliberate Lexicogenesis · 2023-09-08T07:59:36.202Z · LW · GW

I'd want to look at what kinds of tasks / what kinds of thinking they are doing.

I don't have specific examples in the literature of people without internal monologue but here's a case of a person that apparently can do music without doing something very bound up with auditory imagination.

A case study of subject WD (male, 55) with sensory agnosia (auditory and visual) is reported. He describes his experiences with playing music to be similar to the experiences of people suffering from blindsight, maneuvering blindly in the auditory space, without the ability to imagine results of next move (hitting piano key). Yet after a long period of learning WD is able to improvise, surprising himself with correct cadencies, with no conscious influence on what he is playing. For him the only way to know what goes on in his brain is to act it out.

Anecdotal case: I worked with a person who claimed to have absolutely no inner monologue and "thinking in one's head" seemed very weird to her. She's one of the most elaborate arguers I know. A large part of her job at the time was argument mapping.

All I've read about dath ilan is the thing about moving houses around on wires. Where is it described what they do with language?

Mostly smeared across ProjectLawful (at least that's where I read about all of it). Usually, it's brought up when Keltham (the protagonist from dath ilan) gets irritated that Taldane (the language of the D&D world he was magically transported into) doesn't have a short word (or doesn't have a word at all) for an important concept that obviously should have a short word. Some excerpts (not necessarily very representative ones, just what I was able to find with quick search):

Occasionally Keltham thinks single-syllable or two-syllable words in Baseline that refer to mathematical concepts built on top of much larger bases, fluidly integrated into his everyday experience. link

The Baseline phrase for this trope is a polysyllabic monstrosity that would literally translate as Intrinsic-Characteristic Boundary-Edge. A translation that literal would be misleading; the second word-pair of Boundary-Edge is glued together in the particular way that indicates a tuple of words has taken on a meaning that isn't a direct sum of the original components. A slight lilt or click of spoken Baseline; a common punctuation-marker in written Baseline. link

"We've pretty much got a proverb in nearly those exact words, yeah." He utters it in Baseline: an eight-syllable couplet, which rhymes and scans because Baseline was designed in part to make that proverb be a rhyming couplet. link

Comment by Mateusz Bagiński (mateusz-baginski) on Against Almost Every Theory of Impact of Interpretability · 2023-09-05T13:27:15.543Z · LW · GW

A feature is still a fuzzy concept,

"Gene", "species", and even "concept" are also fuzzy concepts but despite that, we managed to substantially improve our understanding of the-things-in-the-world-they-point-to and the phenomena they interact with. Using these fuzzy concepts even made us realize how fuzzy they are, what's the nature of their fuzziness, and what other (more natural/appropriate/useful/reality-at-joint-carving) abstractions we may replace them with.[1] In other words, we can use fuzzy concepts as a ladder/provisional scaffold for understanding. Once our understanding is good enough, we may realize there's a better foundation for the theory than the one that guided us to in the first place. (See: Context of Discovery and Context of Justification)

Or maybe interp could be useful for retargeting the search? This idea suggests that if we find a goal in a system, we can simply change the system's goal and redirect it towards a better goal.

I think this is a promising quest, even if there are still difficulties:

One difficulty you don't list is that it is not clear ex ante that the models we want to steer/retarget are going to have a "goal slot" or, more generally, something that could be used as a motivational API (a "telopheme" in Tsvi's terminology). This does seem to be the case (at least to a significant extent) in the cases studied by Turner et al. but as you point out, the results from smaller models already fail to translate to/predict what we're finding in bigger models (induction heads being a notable exception).

Instrumental convergence makes this problem even murkier. On the one hand, it may lead you to expect that the "goal part"/utility function of the agent will be separated from the rest in order to facilitate goal preservation. At the same time (1) if this would make it easier for us to steer/retarget the AI, then it would be advantageous for the AI to make this part of itself more obscure/less understandable to us; and (2) an AI need not have a clearly factored out goal to be sufficiently smarter than humans to pose an x-risk (see Soares).

I am skeptical that we can gain radically new knowledge from the weights/activations/circuits of a neural network that we did not already know, especially considering how difficult it can be to learn things from English textbooks alone.

One way this could work is: if we have some background knowledge/theory of the domain the AI learns about, then the AI may learn some things that we didn't know but that (conditional on sufficiently good transparency/interpretability/ELK)[2] we can extract from it in order to enrich our understanding.

The important question here is: will interp be better for that than more mundane/behavioral methods? Will there be some thing that interp will find that behavioral methods won't find or that interp finds more efficiently (for whatever measure of efficiency) that behavioral methods don't find?

Total explainability of complex systems with great power is not sufficient to eliminate risks. 

Also, a major theme of Inadequate Equilibria.

Conceptual advances are more urgent.

Obvious counterpoint: in many subdomains of many domains, you need a tight feedback loop with reality to make conceptual progress. Sometimes you need a very tight feedback loop to rapidly iterate on your hypotheses. Also, getting acquainted with low-level aspects of the system lets you develop some tacit knowledge that usefully guides your thinking about the system.

Obvious counter-counterpoint: interp is nowhere near the level of being useful for informing conceptual progress on the things that really matter for AInotkillingeveryone.

  1. ^

    My impression is that most biologists agree that the concept of "species" is "kinda fake", but less so when it comes to genes and concepts.

  2. ^

    Which may mean much better than what we should expect to have in the next N years.

Comment by Mateusz Bagiński (mateusz-baginski) on Gemini modeling · 2023-09-04T17:52:39.812Z · LW · GW

Yeah, maybe it makes more sense. B' would be just a subcategory of B that is sufficient for defining (?) Xp (something like Markov blanket of Xp?). The (endo-)functor from B' to B would be just identity and the relationship between Xp and [Xp] would be represented by a natural transformation?

Comment by Mateusz Bagiński (mateusz-baginski) on Gemini modeling · 2023-09-04T16:20:56.223Z · LW · GW

Sounds close/similar-to-but-not-the-same-as categorical limit (if I understand you and category theory sufficiently correctly).

(Switching back to your notation)

Think of the modeler-mind  and the modeled-mind  as categories where objects are elements (~currently) possibilizable by/within the mind.

[1]Gemini modeling can be represented by a pair of functors:

  1.  which maps the things/elements in  to the same things/elements in  (their "sameness" determined by identical placement in the arrow-structure).[2] In particular, it maps the actualized  in  to the also actualized  in .
  2.  which collapses all elements in  to  in .

For every arrow  in , there is a corresponding arrow  and the same for arrows going in the other directions. But there is no isomorphism between  and . They can fill the same roles but they are importantly different (the difference between "B believing the same thing that A believes" and "B modeling A as believing the that thing").

Now, for every other "candidate"  that satisfies the criterion from the previous paragraph, there is a unique arrow  that factorizes its morphisms through .

On second thought, maybe it's better to just have two almost identical functors, differing only by what  is mapped onto? (I may also be overcomplicating things, pareidolizing, or misunderstanding category theory)

  1. ^

    I'm not sure what the arrows are, unfortunately, which is very unsatisfying

  2. ^

    On second thought, for this to work, we probably need to either restrict ourselves to a subset of each mind or represent them sufficiently coarse-grained-ly.

Comment by Mateusz Bagiński (mateusz-baginski) on Meta Questions about Metaphilosophy · 2023-09-04T14:39:13.808Z · LW · GW

On the one hand, I like this way of thinking and IMO it usefully dissolves diseased questions about many siperficially confusing mind-related phenomena. On the other hand, in the limit it would mean that mathematical/logical/formal structures to the extent that they are in some way implemented or implementable by physical systems... and once I spelled that out I realized that maybe I don't disagree with it at all.

Comment by Mateusz Bagiński (mateusz-baginski) on The possible shared Craft of deliberate Lexicogenesis · 2023-09-04T13:08:44.458Z · LW · GW

think with Words

There are people who (report/claim that they) don't think in words. It looks like having internal monologue is a spectrum, perhaps related to spectra of aphantasia. (or maybe I'm misunderstanding what you mean here)

natural language is how we think

Again, are you gesturing towards something like Language of Thought?

Mathematicians create fractal vocabularies, making names from notation, from mathematicians (eponyms),

AFAIK, eponyms (naming inventions after their inventors) are ~unique to the West/WEIRD culture. (source: The WEIRDest People in the World; cites Wootton's The Invention of Science)

A wider net of possible words (a more expressive morphemicon) catches a wider variety of upsparks.

Maybe completely unrelated but reminds me of some observation that the set of 20-ish basic aminoacids used by terran life seems optimized for covering a sufficiently diverse range of parameters of the aminoacid-space.

Request for term: a plural non-person pronoun

Trivium/datapoint: Polish has three grammatical genders in the singular form (standardly: masculine, feminine, and neuter) but two in the plural form (plural-personal-masculine and plural-everything-else). Closely related Czech also has the same three grammatical genders in the singular but they don't change with pluralization, e.g., there are separate "they" for "plural-he", "plural-she", "plural-it".

Request for term: more flexible pronouns.

See: Also, my impression is that Lojban has some of the features you're thinking about (?)

Request for term: sometimes a person says  in context  meaning , and then says A in context  meaning , and . What do you call , and ?

(Partially) parametrized concepts?

Overall, I'm slightly surprised by no mention of dath ilan, as they seem to have invested quite a lot of labor-hours into optimizing language, including in some of the directions you sketch out in this post.

Comment by Mateusz Bagiński (mateusz-baginski) on Gemini modeling · 2023-08-28T16:28:59.152Z · LW · GW

My attempt to summarize the idea. How accurate is it?

Gemini modeling - Agent A models another agent B when A simulates some aspect of B's cognition by counterfactually inserting an appropriate cognitive element Xp (that is a model of some element X, inferred to be present in B's mind) into an appropriate slot of A's mind, such that the relevant connections between X and other elements of B's mind are reflected by analogous connections between Xp and corresponding elements of A's mind.

Comment by Mateusz Bagiński (mateusz-baginski) on "Concepts of Agency in Biology" (Okasha, 2023) - Brief Paper Summary · 2023-08-26T15:34:10.459Z · LW · GW

"The 'organism-as-agent' idea is seen by some authors as a potential corrective to gene-centrism, emphasizing as it does the power of organisms to make choices, overcome challenges, modify their environment, and shape their own fate. "


I don't see much friction here. It seems perfectly coherent with gene-centrism that sometimes the most optimal thing for a gene to do in order to "advance its fitness" is to make an agentic organism competently pursuing some goals/acting on some drives that are beneficial for the "selfish" gene.


(Nor is it entirely clear, for that matter, whether they should, i.e. whether "long range consequentialism" points at anything real and coherent).

I agree that it may not point to anything real (in the sense of "realizable in our universe"), but I'd be curious to hear why you think it may not even be coherent and in what sense.

I wonder whether, by trying to formalize the fuzzy concept of agency (plausibly originating in an inductive bias that was selected for because of the need to model animals of one's own and different species) that we already have, we are not shooting ourselves into the foot. A review of what relationships people see between "agency" and other salient concepts (autonomy, goals, rationality, intelligence, "internally originated" behavior, etc) is probably valuable just because of locally disentangling our semantic web and giving us more material to work with, but perhaps we should aim for finding more legible/clearly specified/measurable properties/processes in the world that we consider relevant and build our ontology bottom-up from them.

Comment by Mateusz Bagiński (mateusz-baginski) on chinchilla's wild implications · 2023-08-25T13:11:48.622Z · LW · GW

I think I remember seeing somewhere that LLMs learn more slowly on languages with "more complex" grammar (in the sense of their loss decreasing more slowly per the same number of tokens) but I can't find the source right now.

Comment by Mateusz Bagiński (mateusz-baginski) on Why Is No One Trying To Align Profit Incentives With Alignment Research? · 2023-08-23T14:13:12.581Z · LW · GW
Comment by Mateusz Bagiński (mateusz-baginski) on Please don't throw your mind away · 2023-08-22T16:00:25.309Z · LW · GW

Groups, but without closure ?

Comment by Mateusz Bagiński (mateusz-baginski) on Explaining “Hell is Game Theory Folk Theorems” · 2023-08-11T13:20:38.234Z · LW · GW

I must be missing something, so I'd appreciate an explanation.

Say somebody plays 30 and gets punished by everybody else playing 100 forever. The "defector" doesn't gain anything by increasing their temperature from 30, so they stay at 30. Why would any of the punishers continue playing 100 if they would be better off by lowering the temperature and other ounishers couldn't ounish them anyway since the temperature is already permanently fixed at 100?

Comment by Mateusz Bagiński (mateusz-baginski) on Mateusz Bagiński's Shortform · 2023-07-21T09:48:11.074Z · LW · GW

I've read the SEP entry on agency and was surprised how irrelevant it feels to whatever it is that makes me interested in agency. Here I sketch some of these differences by comparing an imaginary Philosopher of Agency (roughly the embodiment of the approach that the "philosopher community" seems to have to these topics), and an Investigator of Agency (roughly the approach exemplified by the LW/AI Alignment crowd).[1]

If I were to put my finger on one specific difference, it would be that Philosopher is looking for the true-idealized-ontology-of-agency-independent-of-the-purpose-to-which-you-want-to-put-this-ontology, whereas Investigator wants a mechanistic model of agency, which would include a sufficient understanding of goals, values, dynamics of development of agency (or whatever adjacent concepts we're going to use after conceptual refinement and deconfusion), etc.

Another important component is the readiness to take one's intuitions as the starting point, but also assume they will require at least a bit of refinement before they start robustly carving reality at its joints. Sometimes you may even need to discard almost all of your intuitions and carefully rebuild your ontology from scratch, bottom-up. Philosopher, on the other hand, seems to (at least more often than Investigator) implicitly assume that their System 1 intuitions can be used as the ground truth of the matter and the quest for formalization of agency ends when the formalism perfectly captures all of our intuitions and doesn't introduce any weird edge cases.

Philosopher asks, "what does it mean to be an agent?" Investigator asks, "how do we delineate agents from non-agents (or specify some spectrum of relevant agency-adjacent) properties, such that they tell us something of practical importance?"

Deviant causal chains are posed as a "challenge" to "reductive" theories of agency, which try to explain agency by reducing it to causal networks.[2] So what's the problem? Quoting:

… it seems always possible that the relevant mental states and events cause the relevant event (a certain movement, for instance) in a deviant way: so that this event is clearly not an intentional action or not an action at all. … A murderous nephew intends to kill his uncle in order to inherit his fortune. He drives to his uncle’s house and on the way he kills a pedestrian by accident. As it turns out, this pedestrian is his uncle.

At least in my experience, this is another case of a Deep Philosophical Question that no longer feels like a question, once you've read The Sequences or had some equivalent exposure to the rationalist (or at least LW-rationalist) way of thinking.

About a year ago, I had a college course in philosophy of action. I recall having some reading assigned, in which the author basically argued that for an entity to be an agent, it needs to have an embodied feeling-understanding of action. Otherwise, it doesn't act, so can't be an agent. No, it doesn't matter that it's out there disassembling Mercury and reusing its matter to build the Dyson Sphere. It doesn't have the relevant concept of action, so it's not an agent.

  1. This is not a general diss on philosophizing, I certainly think there is value in philosophy-like thinking. ↩︎

  2. My wording, not SEP's, but I think it's correct. ↩︎

Comment by Mateusz Bagiński (mateusz-baginski) on Do humans derive values from fictitious imputed coherence? · 2023-07-17T15:00:58.238Z · LW · GW

A potential culture-level historical case of FIAT: AFAIK, Jewish monotheism emerged in response sometime in 5th century BCE in the aftermath (during?) the Babylonian captivity. Before that, Jews were henoteistic, with slight "preference" for JHWH. When their country was conquered "they reasoned" "we must have insulted the God with our cult of other gods (otherwise he wouldn't allow Babylonians to enslave us), so let's erase all explicit mentions of polytheism from the scriptures and ban the worship of non-JHWH gods".

Also, I wonder how FIAT relates to uniquely human capacity and tendency/drive to overimitate others. Is overimitation tied to inference of latent reasons for behavior and re-application of that mode of thinking to one's past self results in FIAT?

Comment by Mateusz Bagiński (mateusz-baginski) on Consciousness as a conflationary alliance term · 2023-07-12T11:19:47.355Z · LW · GW

Interesting post. I definitely agree that consciousness is a very conflated term. Also, I wouldn't think that some people may pick vestibular sense or proprioception as the closest legible neighbors/subparts of (what they mean by) "consciousness". A few nitpicks:

  • When the interviewees said that they consider ("their kind of") consciousness valuable, did they mean it personally (e.g., I appreciate having it) or did they see it as a source of moral patienthood ("I believe creatures capable of introspection have moral value or at least more value than those that can't introspect)?
  • A person may just have a very fuzzy nebulous concept which they have never tried to reduce to something else (and make sure that this reduced notion holds together with the rest of their world model). In that case, when asked to sit down, reflect, and describe it in terms of moving parts, they may simply latch onto a close neighbor in their personal concept space.
  • Can you point to two specific examples of conflationary alliances formed around drastically different understandings of this concept? I.e., these 3 people use "consciousness" to refer to X and these to mean Y and this is a major theme in their "conflict". What examples you gave in part 2 looks to me like typical usage of mysterious answers as semantic stop signs.
Comment by Mateusz Bagiński (mateusz-baginski) on Views on when AGI comes and on strategy to reduce existential risk · 2023-07-10T07:23:57.721Z · LW · GW

Don't you think that once scaling hits the wall (assuming it does) the influx of people will be redirected towards natural philosophy of Intelligence and ML?

Comment by Mateusz Bagiński (mateusz-baginski) on Using (Uninterpretable) LLMs to Generate Interpretable AI Code · 2023-07-05T16:07:01.225Z · LW · GW

I mostly second Beren's reservations, but given that current models can already improved sorting algorithms in ways that didn't occur to humans (ref), I think it's plausible that they prove useful in generating algorithms for automating interpretability and the like. E.g., some elaboration on ACDC, or ROME, or MEMIT.

Comment by Mateusz Bagiński (mateusz-baginski) on Did Bengio and Tegmark lose a debate about AI x-risk against LeCun and Mitchell? · 2023-06-25T19:47:16.680Z · LW · GW

Her, Mitchell is a woman

Comment by Mateusz Bagiński (mateusz-baginski) on Why am I Me? · 2023-06-25T14:46:40.369Z · LW · GW

Take the Doomsday Argument(DA) as an example. It proposes the uninformed prior of one's own birth rank among all human beings ought to be uniformly distributed from the 1st to the last. So learning our actual birth rank (we are around the 100 billionth) should shift our belief about the future toward earlier extinction. E.g. I am more likely to be the 100 billionth person if there are only 200 Billion humans overall rather than 200 Trillion. So the fact I'm the 100 billionth means the former is more likely.

This is not how DA goes. It goes like this: if we are just as likely to be any particular observer, then the probability of being born at any particular time period is proportional to how many observers are alive at that time period. So, on the most outside-ish view (i.e., not taking into account what we know about the world, what we can predict about how the future is going to unfold etc), we should assign the most probability mass to those distributions of observers over history, where the time we were born had the most observers, which means that in the future there's going to be fewer observers than right now, probably because of extinction or something.

Also, if you assume the number of people alive throughout history is "fixed" at 200 billion "you" are just as likely to be the first person as the 100 billionth and 200 billionth. If you predict to be the 100 billionth, you can at best minimize the prediction error/deviation score.

Comment by Mateusz Bagiński (mateusz-baginski) on Endo-, Dia-, Para-, and Ecto-systemic novelty · 2023-06-20T10:27:57.049Z · LW · GW

Pidgins, being unstable and noncanonical, witness the ectosystemic nature: the foreign languages don't integrate. Creoles, however, could be dubbed "systemopoetic novelty"--like parasystemic novelty, in nucleating a system, but more radical, lacking a broader system to integrate into.

The way I think about it: ectosystemic novelty in the form of approximately unintelligible language-(community-)systems living by each other necessitates the development of an interface (pidgin), which constitutes parasystemic novelty. Initially, it has much less structure, is much less self-standing than any of the prior fully structured languages but children growing up with it imbue it with structure, developing into a creole, which becomes a new system, independent of its "parent languages".

Is this different than what you wrote or is it basically the same thing and I'm missing something?

Comment by Mateusz Bagiński (mateusz-baginski) on Matt Taibbi's COVID reporting · 2023-06-15T18:24:47.414Z · LW · GW

This was probably a factor but also:

IIRC research at WIV was done in collaboration with EcoHealth Alliance and/or other similar US-based orgs. Granting WIV BSL4 necessary for this kind of gain-of-function research was in part based on their assessments. US establishment had a reason to cover it up because it was in part their own fuckup.

Comment by Mateusz Bagiński (mateusz-baginski) on Philosophical Cyborg (Part 1) · 2023-06-15T12:16:49.075Z · LW · GW

There doesn’t seem to be a consensus on what philosophy does or even what it is.  One view of philosophy is that it is useless, or actively unhelpful, for alignment (at least of the ‘literally-don’t-kill-everyone’ variety, particularly if one’s timelines are short): it isn’t quantifiable, involves interminable debates, and talks about fuzzy-bordered concepts sometimes using mismatched taxonomies, ontologies, or fundamental assumptions


IMO it's accurate to say that philosophy (or at least the kind of philosophy that I find thought-worthy) is a category that includes high-level theoretical thinking that either (1) doesn't fit neatly into any of the existing disciplines (at least not yet) or (2) is strongly tied to one or some of them but engages in high-level theorizing/conceptual engineering/clarification/reflection to the extent that is not typical of that discipline ("philosophy of [biology/physics/mind/...]").

(1) is also contiguous with the history of the concept. At some point, all of science (perhaps except mathematics) was "(natural) philosophy". Then various (proto-)sciences started crystallizing and what was not seen as deserving of its own department, remained in the philosophy bucket.

Comment by Mateusz Bagiński (mateusz-baginski) on The shard theory of human values · 2023-06-03T09:56:47.830Z · LW · GW

I wonder how the following behavioral patterns fit into Shard Theory

  • Many mammalian species have strong default aversion to young of their own species. They (including females) deliberately avoid contact with the young and can even be infanticidal. Physiological events associated with pregnancy (mostly hormones) rewires the mother's brain such that when she gives birth, she immediately takes care of the young, grooms them etc., something she has never done before. Do you think this can be explained by the rewiring of her reward circuit such that she finds simple actions associated with the pups highly rewarding and then bootstraps to learning complex behaviors from that?
  • Salt-starved rats develop an appetite for salt and are drawn to stimuli predictive of extremely salty water, even though on all previous occasions they found it extremely aversive, which caused them to develop conditioned fear response to the cue predictive of salty water. (see Steve's post on this experiment
Comment by Mateusz Bagiński (mateusz-baginski) on AI #13: Potential Algorithmic Improvements · 2023-05-29T10:56:07.162Z · LW · GW

Jeffrey Ladish has compatible and unsurprising reactions here.


the link is broken

Comment by Mateusz Bagiński (mateusz-baginski) on Open Thread With Experimental Feature: Reactions · 2023-05-25T09:56:28.469Z · LW · GW

Maybe also a reminder about the comments to which you've reacted with that. E.g., if you haven't replied in a week or so (could be configured per user or something)

Comment by Mateusz Bagiński (mateusz-baginski) on Towards Measures of Optimisation · 2023-05-21T14:45:40.634Z · LW · GW

Unfortunately it has a problem of its own - it’s sensitive to our choice of . By adding some made up element to  with large negative utility and zero probability of occurring, we can make OP arbitrarily low. In that case basically all of the default relative expected utility comes from avoiding the worst outcome, which is guaranteed, so you don’t get any credit for optimising.


What if we measure the utility of an outcome relative not to the worst one but to the status quo, i.e., the outcome that would happen if we did nothing/took null action?

In that case, adding or subtracting outcomes to/from   doesn't change  for outcomes that were already in , as long as the default outcome also remains in .

Obviously, this means that  for any  depends on the choice of default outcome. But I think it's OK? If I have $1000 and increase my wealth to $1,000,000, then I think I "deserve" being assigned more optimization power than if I had $1,000,000 and did nothing, even if the absolute utility I get from having $1,000,000 is the same.

Comment by Mateusz Bagiński (mateusz-baginski) on The Eden Project · 2023-05-12T17:09:03.990Z · LW · GW

Is this post supposed to communicate something like Stranger Than History?

Comment by Mateusz Bagiński (mateusz-baginski) on TED talk by Eliezer Yudkowsky: Unleashing the Power of Artificial Intelligence · 2023-05-07T06:28:47.084Z · LW · GW

Correction: this is TEDx - a more local less official version of TED Apparently, it's TED, I only looked at the channel name, sorry for confusion

Comment by Mateusz Bagiński (mateusz-baginski) on Transcript and Brief Response to Twitter Conversation between Yann LeCunn and Eliezer Yudkowsky · 2023-04-26T17:59:23.024Z · LW · GW

Zvi will update the post if Yann responds further in the thread with Eliezer but there will be no new Zvi posts centered on Yann

Comment by Mateusz Bagiński (mateusz-baginski) on Paths to failure · 2023-04-25T08:58:39.100Z · LW · GW

Note that such a constellation would likely be unstable if the intelligence and capabilities of the AI increase over time, leading to a situation where the humans in the man-machine-system depend more and more on the AI and are less and less in control, up to the point where humans are not needed anymore and the uncontrollable man-machine-system transforms into an uncontrollable autonomous AI.

It would probably be quite easy to train a GPT (e.g., decision transformer) to predict actions made by human components of the system, so assumptions required for claiming that such a system would be unstable are minimal.

Comment by Mateusz Bagiński (mateusz-baginski) on Research agenda: Formalizing abstractions of computations · 2023-04-22T12:08:03.921Z · LW · GW

Reading this, it dawned on me that we tend to talk about human-understandability of NNs as if it was magic.[1] I mean, how much is there to human-understandability--in the sense of sufficiently intelligent (but not extremely rare genius-level) humans being able to work with a phenomenon and model it accurately on a sufficient level of granularity--beyond complexity of that phenomenon and computational reducibility of its dynamics?

Sure, we find some ways of representing information and thinking more intuitive and easier to grasp than others but we can learn new ones that would seem pretty alien to a naive hunter-gatherer mind, like linear algebra or category theory.

  1. Or perhaps it's just me and this is background knowledge for everybody else, whatever. ↩︎

Comment by Mateusz Bagiński (mateusz-baginski) on The Computational Anatomy of Human Values · 2023-04-12T12:54:29.032Z · LW · GW

One of the most interesting posts I've read over the last couple of months


a general purpose multimodal world model which contains both latent representations highly suited to predicting the world (due to the unsupervised sensory cortices) as well as an understanding of a sense of self due to the storage and representation of large amounts of autobiographical memory.

Doesn't this imply that people with exceptionally weak autobiographical memory (e.g., Eliezer) have less self-understanding/sense of self? Or maybe you think this memory is largely implicit, not explicit? Or maybe it's enough to have just a bit of it and it doesn't "impair" unless you go very low?


One thing that your model of unsupervised learning of the world model(s) doesn't mention is that humans apparently have strong innate inductive biases for inferring the presence of norms and behaving based on perception (e.g., by punishing transgressors) of those norms, even when they're not socially incentivized to do so (see this SEP entry).[1] I guess you would explain it as some hardcoded midbrain/brainstem circuit that encourages increased attention to socially salient information, driving norm inferrence and development of value concepts, which then get associatively satured with valence and plugged into the same or some other socially relevant circuits for driving behavior?


[…] the natural abstraction hypothesis is strongly true and values are a natural abstraction so that in general the model will tend to stay within the basin of ‘sensible-ish’ human values which is where the safety margins is. Moreover, we should expect this effect to improve with scale, since the more powerful models might have crisper and less confused internal concepts.

I'm not sure. It's not obvious to me that more powerful models won't be able to model human behavior using abstractions very unlike human values, and possible quite incomprehensible to us.


my hypothesis is that human values are primarily socially constructed and computationally exist primarily in webs of linguistic associations (embeddings) in the cortex (world model) in an approximately linear vector space.

Can you elaborate on what it means for concepts encoded in the cortex to exist in a ~linear vector space? How would a world where that wasn't the case look like?

  1. Interestingly, this "promiscuous normativity", as it's sometimes called, leads us to conflate (sometimes called "normal is moral" bias, see Knobe, 2019, mostly pages 561-562), which is not surprising in your model. ↩︎

Comment by Mateusz Bagiński (mateusz-baginski) on The Computational Anatomy of Human Values · 2023-04-12T08:59:14.425Z · LW · GW

Ok, so what you mean by 2) is something like reflective game-theoretic equilibrium for that particular environment (including social environment)?

Comment by Mateusz Bagiński (mateusz-baginski) on The Computational Anatomy of Human Values · 2023-04-12T08:25:52.106Z · LW · GW

This is fair. I personally have very low odds on success but it is not a logical impossibility.

I'd say that the probability of success depends on

(1) Conservatism - how much of the prior structure (i.e., what our behavior actually looks like at the moment, how it's driven by particular shards, etc.). The more conservative you are, the harder it is.

(2) Parametrization - how many moving parts (e.g., values in value consequentialism or virtues in virtue ethics) you allow for in your desired model - the more, the easier.

If you want to explain all of human behavior and reduce it to one metric only, the project is doomed.[1]

For some values of (1) and (2) you can find one or more coherent extrapolations of human values/value concepts. The thing is, often there's not one extrapolation that is clearly better for one particular person and the greater the number of people whose values you want to extrapolate, the harder it gets. People differ in what extrapolation they would prefer (or even if they would like to extrapolate away from their status quo common sense ethics) due to different genetics, experiences, cultural influences, pragmatic reasons etc.

  1. There may also be some misunderstanding if one side assumes that the project is descriptive (adequately describe all of human behavior with a small set of latent value concepts) or prescriptive (provide a unified, coherent framework that retains some part of our current value system but makes it more principled, robust against moving out of distribution, etc.) ↩︎

Comment by Mateusz Bagiński (mateusz-baginski) on The Computational Anatomy of Human Values · 2023-04-12T07:50:59.395Z · LW · GW

If I understand Beren's model correctly, [2] falls within the innate, hardcoded reward circuits in the hypothalamus and the brainstem.

Beren touches on [3] discussing them as attempted generalizations/extrapolations of [1] into a coherent framework, such as consequences, utility, adherence to some set of explicitly stated virtues, etc.

Comment by Mateusz Bagiński (mateusz-baginski) on The Computational Anatomy of Human Values · 2023-04-11T16:26:41.033Z · LW · GW

These linguistic concepts become occupy important and salient positions in the latent space of the world model.


Typo: probably "become important and occupy salient..." or "occupy important and salient"

(feel free to delete this comment after fixing [or not fixing])


EDIT: another



Comment by Mateusz Bagiński (mateusz-baginski) on Ng and LeCun on the 6-Month Pause (Transcript) · 2023-04-09T17:49:20.107Z · LW · GW

I think wanting to seem like sober experts makes them kinda believe the things they expect other people to expect to hear from sober experts.

Comment by Mateusz Bagiński (mateusz-baginski) on SERI MATS - Summer 2023 Cohort · 2023-04-09T10:09:49.825Z · LW · GW

There's most likely going to be a winter cohort starting in November

Comment by Mateusz Bagiński (mateusz-baginski) on Anthropic is further accelerating the Arms Race? · 2023-04-07T18:26:38.279Z · LW · GW

Reading this, it reminds me of the red flags that some people (e.g. Soares) saw when interacting with SBF and, once shit hit the fan, ruminated over not having taken some appropriate action.

Comment by Mateusz Bagiński (mateusz-baginski) on Lessons from Convergent Evolution for AI Alignment · 2023-03-28T12:20:49.919Z · LW · GW

Standardized communication protocols

Language is the most obvious example, but there's plenty of others. E.g. taking different parts of the body as subsystems communicating with each other, one neurotransmitter/hormone often has very similar effects in many parts of the body.

In software, different processes can communicate with each other by passing messages having some well-defined format. When you're sending an API request, you usually have a good idea of what shape the response is going to take and if the request fails, it should fail in a predictable way that can be harmlessly handled. This makes making reliable software easier.

Some cases of standardization are spontaneous/bottom-up, whereas others are engineered top-down. Human language is both. Languages with greater number of users seem to evolve simpler, more standardized grammars, e.g. compare Russian to Czech or English to Icelandic (though syncretism and promiscuous borrowing may also have had an impact in the second case). I don't know if something like that occurs at all in programming languages but one factor that makes it much less likely is the need to maintain backward-compatibility, which is important for programing languages but much weaker for human languages.

Comment by Mateusz Bagiński (mateusz-baginski) on Is “FOXP2 speech & language disorder” really “FOXP2 forebrain fine-motor crappiness”? · 2023-03-24T18:26:03.529Z · LW · GW

Human speech and bird song are both cases of vocal learning. They are (at least) largely learned, more complex and require more precision control, whereas mouse vocalizations are mostly hardwired

Comment by Mateusz Bagiński (mateusz-baginski) on Is “FOXP2 speech & language disorder” really “FOXP2 forebrain fine-motor crappiness”? · 2023-03-23T16:57:43.960Z · LW · GW

Is it actually true that speech is by far the most fast & complicated & intricate & precise motor control task that a human routinely does? I mean, I think so? But I’m not sure how to prove or quantify that

Hmm, I think it is but anecdotally speaking it seems to me that you can be quite clumsy, have bad coordination etc bad no speech impairment

Comment by Mateusz Bagiński (mateusz-baginski) on Mateusz Bagiński's Shortform · 2023-03-18T06:39:18.027Z · LW · GW

That's also what this meta-analysis found but I was mostly wondering about social cognition deficits (though looking back I see it's not clear in the original shortform)

Comment by Mateusz Bagiński (mateusz-baginski) on Mateusz Bagiński's Shortform · 2023-03-18T06:36:35.995Z · LW · GW

Huh, you're right. I thought most fruits have enough to cover daily requirements.

Comment by Mateusz Bagiński (mateusz-baginski) on What's the Relationship Between "Human Values" and the Brain's Reward System? · 2023-03-17T18:27:45.534Z · LW · GW

Avoiding wireheading doesn't seem like failed inner alignment - avoiding wireheading now can allow you to get even more pleasure in the future because wireheading makes you vulnerable/less powerful.

Even if this is the case, this is not why (most) humans don't want to wirehead, in the same way that their objection to killing an innocent person whose organs could save 10 other people are not driven by some elaborate utilitarian arguments that this would be bad for the society.

Comment by Mateusz Bagiński (mateusz-baginski) on Mateusz Bagiński's Shortform · 2023-03-14T17:42:53.455Z · LW · GW

Does severe vitamin C deficiency (i.e. scurvy) lead to oxytocin depletion?

According to Wikipedia

The activity of the PAM enzyme [necessary for releasing oxytocin fromthe neuron] system is dependent upon vitamin C (ascorbate), which is a necessary vitamin cofactor.

I.e. if you don't have enough vitamin C, your neurons can't release oxytocin. Common sensically, this should lead to some psychological/neurological problems, maybe with empathy/bonding/social cognition?

Quick googling "scurvy mental problems" or "vitamin C deficiency mental symptoms" doesn't return much on that. This meta-analysis finds some association of sub-scurvy vitamin C deficiency with depression, mood problems, worse cognitive functioning and some other psychiatric conditions but no mention of what I'd suspect from lack of oxytocin. Possibly oxytocin is produced in low enough levels that this doesn't really matter because you need very little vit C? But on the other hand (Wikipedia again)

By chance, sodium ascorbate by itself was found to stimulate the production of oxytocin from ovarian tissue over a range of concentrations in a dose-dependent manner.

So either this (i.e. disturbed social cognition) is not how we should expect oxytocin deficiencies to manifest or vitamin C deficiency manifests in so many ways in the brain that you don't even bother with "they have worse theory of mind than when they ate one apple a day".

Comment by Mateusz Bagiński (mateusz-baginski) on Open & Welcome Thread — March 2023 · 2023-03-13T10:32:20.812Z · LW · GW

Has anybody converted into ebook format or is working on it?

Comment by Mateusz Bagiński (mateusz-baginski) on Chaos Induces Abstractions · 2023-03-13T10:11:03.359Z · LW · GW

I'm not sure if that's what you mean that genes and agency are both perfectly valid abstractions IMO. An abstraction can summarize the state of a small part of the system (e.g. genes), not necessarily the entire system (e.g. temperature).

Comment by Mateusz Bagiński (mateusz-baginski) on Schizophrenia as a deficiency in long-range cortex-to-cortex communication · 2023-03-01T08:53:59.756Z · LW · GW

Hm, I had a vague memory that contrast detection relies on something like lateral inhibition but when I thought about it a bit more it doesn't really make sense and I guess I conflated it with edge detection in the retina.

Regarding cerebellum in hearing voices: If I understand your model correctly, it goes something like this. Region S (sender) "generates voices" and region R (receiver) "hears voices" generated by S. R expects to receive those signals from S (or maybe even just expects to receive these kinds of signals in general, without specifying where they come from). R gets surprised when it receives unexpected signals and interprets them as "not mine". R would expect to receive them, if it first got a message "hey, S is soon going to send some voice-signals to you". Isn't this exactly the role of the cerebellum, to learn that, e.g. if S activates in this particular way (about to "generate voices"), then R will soon activate in the other way ("hears voices") and therefore it would make sense to preempt R, so that it can expect to get that particular signal from S and act accordingly even before receiving that signal?

Comment by Mateusz Bagiński (mateusz-baginski) on A mostly critical review of infra-Bayesianism · 2023-03-01T08:03:08.825Z · LW · GW

(according to Vanessa there are only three people in the world who fully understand the infra-Bayesian sequence)


Vanessa, Diffractor, and who is the third one?