Posts

Less Anti-Dakka 2024-05-31T09:07:10.450Z
Some Problems with Ordinal Optimization Frame 2024-05-06T05:28:42.736Z
What are the weirdest things a human may want for their own sake? 2024-03-20T11:15:09.791Z
Three Types of Constraints in the Space of Agents 2024-01-15T17:27:27.560Z
'Theories of Values' and 'Theories of Agents': confusions, musings and desiderata 2023-11-15T16:00:48.926Z
Charbel-Raphaël and Lucius discuss interpretability 2023-10-30T05:50:34.589Z
"Wanting" and "liking" 2023-08-30T14:52:04.571Z
GPTs' ability to keep a secret is weirdly prompt-dependent 2023-07-22T12:21:26.175Z
How do you manage your inputs? 2023-03-28T18:26:36.979Z
Mateusz Bagiński's Shortform 2022-12-26T15:16:17.970Z
Kraków, Poland – ACX Meetups Everywhere 2022 2022-08-24T23:07:07.542Z

Comments

Comment by Mateusz Bagiński (mateusz-baginski) on Aligning AI by optimizing for "wisdom" · 2024-10-15T09:36:52.188Z · LW · GW

Based on my attending Oliver's talk, this may be relevant/useful:

Comment by Mateusz Bagiński (mateusz-baginski) on Rationalism before the Sequences · 2024-10-09T15:57:16.423Z · LW · GW

https://en.wikipedia.org/wiki/X_Club

Comment by Mateusz Bagiński (mateusz-baginski) on A Narrow Path: a plan to deal with AI extinction risk · 2024-10-08T13:55:34.866Z · LW · GW

I too have reservations about points 1 and 3 but not providing sufficient references or justifications doesn't imply they're not on SL1.

Comment by Mateusz Bagiński (mateusz-baginski) on Overview of strong human intelligence amplification methods · 2024-10-08T12:10:33.443Z · LW · GW

mentioned in the FAQ

Comment by Mateusz Bagiński (mateusz-baginski) on Consciousness As Recursive Reflections · 2024-10-06T02:43:00.425Z · LW · GW

(I see what podcasts you listen to.)

Comment by Mateusz Bagiński (mateusz-baginski) on Shortform · 2024-09-30T17:37:54.877Z · LW · GW

My notion of progress is roughly: something that is either a building block for The Theory (i.e. marginally advancing our understanding) or a component of some solution/intervention/whatever that can be used to move probability mass from bad futures to good futures.

Re the three you pointed out, simulators I consider a useful insight, gradient hacking probably not (10% < p < 20%), and activation vectors I put in the same bin as RLHF whatever is the appropriate label for that bin.

Comment by Mateusz Bagiński (mateusz-baginski) on Shortform · 2024-09-30T13:28:57.540Z · LW · GW

Also, I'm curious what it is that you consider(ed) AI safety progress/innovation. Can you give a few representative examples?

Comment by Mateusz Bagiński (mateusz-baginski) on Shortform · 2024-09-30T09:56:16.742Z · LW · GW
  • the approaches that have been attracting the most attention and funding are dead ends
Comment by Mateusz Bagiński (mateusz-baginski) on Ruby's Quick Takes · 2024-09-28T16:49:36.620Z · LW · GW

I'd love to try it, mainly thinking about research (agent foundations and AI safety macrostrategy).

Comment by Mateusz Bagiński (mateusz-baginski) on peterbarnett's Shortform · 2024-09-27T10:26:42.945Z · LW · GW

I propose "token surprise" (as in type-token distinction). You expected this general type of thing but not that Ivanka would be one of the tokens instantiating it.

Comment by Mateusz Bagiński (mateusz-baginski) on Reformative Hypocrisy, and Paying Close Enough Attention to Selectively Reward It. · 2024-09-20T10:58:07.178Z · LW · GW

It's better but still not quite. When you play on two levels, sometimes the best strategy involves a pair of (level 1 and 2) substrategies that are seemingly opposites of each other. I don't think there's anything hypocritical about that.

Similarly, hedging is not hypocrisy.

Comment by Mateusz Bagiński (mateusz-baginski) on Reformative Hypocrisy, and Paying Close Enough Attention to Selectively Reward It. · 2024-09-11T06:21:23.700Z · LW · GW

Do you think [playing in a rat race because it's the most locally optimal for an individual thing to do while at the same advocating for abolishing the rat race] is an example of reformative hypocrisy?

Or even more broadly, defecting in a prisoner's dilemma while exposing an interface that would allow cooperation with other like-minded players?

I've had this concept for many years and it hasn't occurred to me to give it a name (How Stupid Not To Have Thought Of That) but if I tried to give it a name, I definitely wouldn't call it a kind of hypocrisy.

Comment by Mateusz Bagiński (mateusz-baginski) on tailcalled's Shortform · 2024-09-10T01:24:13.780Z · LW · GW

It's not clear to me how this results from "excess resources for no reasons". I guess the "for no reasons" part is crucial here?

Comment by Mateusz Bagiński (mateusz-baginski) on The Information: OpenAI shows 'Strawberry' to feds, races to launch it · 2024-09-05T19:44:06.331Z · LW · GW

I meant this strawberry problem.

Comment by Mateusz Bagiński (mateusz-baginski) on A bet for Samo Burja · 2024-09-05T16:10:05.513Z · LW · GW

Samo said that he would bet that AGI is coming perhaps in the next 20-50 years, but in the next 5.

I haven't listened to the pod yet but I guess you meant "but not in the next 5".

Comment by Mateusz Bagiński (mateusz-baginski) on Richard Ngo's Shortform · 2024-09-04T00:07:01.039Z · LW · GW

FWIW Oliver's presentation of (some fragment of) his work at ILIAD was my favorite of all the talks I attended at the conference.

Comment by Mateusz Bagiński (mateusz-baginski) on The Cognitive-Theoretic Model of the Universe: A Partial Summary and Review · 2024-08-28T13:41:09.080Z · LW · GW

I am not totally sure why he considers discrete models to be unable to describe initial states or state-transition programming.

AFAIU, he considers them inadequate because they rely on an external interpreter, whereas the model of reality should be self-interpreting because there is nothing outside of reality to interpret it.

Wheeler suggests some principles for constructing a satisfactory explanation. The first is that "The boundary of a boundary is zero": this is an algebraic topology theorem showing that, when taking a 3d shape, and then taking its 2d boundary, the boundary of the 2d boundary is nothing, when constructing the boundaries in a consistent fashion that produces cancellation; this may somehow be a metaphor for ex nihilo creation (but I'm not sure how).

See this as an operation that takes a shape and produces its boundary. It goes 3D shape -> 2D shape -> nothing. If you reverse the arrows you get nothing -> 2D shape -> 3D. (Of course, it's not quite right because (IIUC) all 2D shapes have boundary zero but I guess it's just meant as a rough analogy.)

He notes a close relationship between logic, cognition, and perception: for example, "X | !X" when applied to perception states that something and its absence can't both be perceived at once

This usage of logical operators is confusing. In the context of perception, he seems to want to talk about NAND: you never perceive both something and its absence but you may also not perceive either. 

(note that "X | !X" is equivalent to "!(X & !X)" in classical but not intuitionistic logic)

Intuitionistic logic doesn't allow  either.[1] It allows .

Langan contrasts between spatial duality principles ("one transposing spatial relations and objects" and temporal duality principles ("one transposing objects or spatial relations with mappings, functions, operations or processes"). This is now beyond my own understanding.

It's probably something like: if you have a spatial relationship between two objects X and Y, you can view it as an object with X and Y as endpoints. Temporally, if X causes Y, then you can see it as a function/process that, upon taking X produces Y.


The most confusing/unsatisfying thing for me about CTMU (to the extent that I've engaged with it so far) is that it doesn't clarify what "language" is. It points ostensively at examples: formal languages, natural languages, science, perception/cognition, which apparently share some similarities but what are those similarities?

  1. ^

    Though paraconsistent logic does.

Comment by Mateusz Bagiński (mateusz-baginski) on The Information: OpenAI shows 'Strawberry' to feds, races to launch it · 2024-08-28T03:38:07.127Z · LW · GW

Did they name it after the strawberry problem!?

Comment by Mateusz Bagiński (mateusz-baginski) on Wei Dai's Shortform · 2024-08-26T07:41:49.998Z · LW · GW

Here are some axes along which I think there's some group membership signaling in philosophy (IDK about the extent and it's hard to disentangle it from other stuff):

  • Math: platonism/intuitionism/computationalism (i.e. what is math?), interpretations of probability, foundations of math (set theory vs univalent foundations)
  • Mind: externalism/internalism (about whatever), consciousness (de-facto-dualisms (e.g. Chalmers) vs reductive realism vs illusionism), language of thought vs 4E cognition, determinism vs compatibilism vs voluntarism
  • Metaphysics/ontology: are chairs, minds, and galaxies real? (this is somewhat value-laden for many people)
  • Biology: gene's-eye-view/modern synthesis vs extended evolutionary synthesis
Comment by Mateusz Bagiński (mateusz-baginski) on A Case for the Least Forgiving Take On Alignment · 2024-08-25T12:57:44.012Z · LW · GW

Moreover, I don't think that some extra/different planning machinery was required for language itself, beyond the existing abstraction and model-based RL capabilities that many other animals share.

I would expect to see sophisticated ape/early-hominid-lvl culture in many more species if that was the case. For some reason humans went on the culture RSI trajectory whereas other animals didn't. Plausibly there was some seed cognitive ability (plus some other contextual enablers) that allowed a gene-culture "coevolution" cycle to start.

Comment by Mateusz Bagiński (mateusz-baginski) on Open Thread Summer 2024 · 2024-08-25T10:02:50.964Z · LW · GW

My feedback is that I absolutely love it. My favorite feature released since reactions or audio for all posts (whichever was later).

Comment by Mateusz Bagiński (mateusz-baginski) on Extended Interview with Zhukeepa on Religion · 2024-08-24T16:54:19.324Z · LW · GW

In other words, there's a question about how to think about truth in a way that honors perspectivalism, while also not devolving into relativism. And the way Jordan and I were thinking about this, was to have each filter bubble -- with their own standards of judgment for what's true and what's good -- to be fed the best content from the other filter bubbles by the standards from within each filter bubble, rather than the worst content, which is more like what we see with social media today.

 

Seems like Monica Anderson was trying to do something like that with BubbleCity. (pdf, podcast)

Comment by Mateusz Bagiński (mateusz-baginski) on What are the best resources for building gears-level models of how governments actually work? · 2024-08-19T17:23:27.543Z · LW · GW

This is not quite an answer to your question but some recommendations in the comments to this post may be relevant: https://www.lesswrong.com/posts/SCs4KpcShb23hcTni/ideal-governance-for-companies-countries-and-more

Comment by Mateusz Bagiński (mateusz-baginski) on Alex_Altair's Shortform · 2024-08-18T16:40:49.367Z · LW · GW

It does for me

Comment by Mateusz Bagiński (mateusz-baginski) on Raemon's Shortform · 2024-08-17T07:42:48.993Z · LW · GW

Sometimes I look up a tag/concept to ensure that I'm not spouting nonsense about it.

But most often I use them to find the posts related to a topic I'm interested in.

Comment by Mateusz Bagiński (mateusz-baginski) on Self-Other Overlap: A Neglected Approach to AI Alignment · 2024-08-13T19:41:55.076Z · LW · GW

Possible problems with this approach

 

One failure mode you seem to have missed (which I'm surprised by) is that the SOO metric may be good-heartable. It might be the case (for all we know) that by getting the model to maximize SOO (subject to constraints of sufficiently preserving performance etc), you incentivize it to encode the self-other distinction in some convoluted way that is not adequately captured by the SOO metric, but is sufficient for deception.

Comment by Mateusz Bagiński (mateusz-baginski) on Leaving MIRI, Seeking Funding · 2024-08-08T19:00:11.638Z · LW · GW

Radical probabilist

Paperclip minimizer

Child of LDT

Dragon logician

Embedded agent

Hufflepuff cynic

Logical inductor

Bayesian tyrant

Comment by Mateusz Bagiński (mateusz-baginski) on Alexander Gietelink Oldenziel's Shortform · 2024-08-07T13:09:13.117Z · LW · GW

Asexual species universally seem to have come into being very recently. They likely go extinct due to lack of genetic diversity and attendant mutational load catastrophe and/or losing arms races with parasites.

Bdelloidea are an interesting counterexample: they evolved obligate parthenogenesis ~25 mya.

Comment by Mateusz Bagiński (mateusz-baginski) on Near-mode thinking on AI · 2024-08-06T07:31:17.724Z · LW · GW

There's a famous prediction market about whether AI will get gold from the International Mathematical Olympiad by 2025.

correction: it's by the end of 2025

Comment by Mateusz Bagiński (mateusz-baginski) on Zach Stein-Perlman's Shortform · 2024-08-04T16:12:50.177Z · LW · GW

Also, they failed to provide the promised fraction of compute to the Superalignment team (and not because it was needed for non-Superalignment safety stuff).

Comment by Mateusz Bagiński (mateusz-baginski) on Alexander Gietelink Oldenziel's Shortform · 2024-07-30T10:36:06.442Z · LW · GW

Well, past events--before some time t--kind of obviously can't be included in the Markov blanket at time t.

As far as I understand it, the MB formalism captures only momentary causal interactions between "Inside" and "Outside" but doesn't capture a kind of synchronicity/fine-tuning-ish statistical dependency that doesn't manifest in the current causal interactions (across the Markov blanket) but is caused by past interactions.

For example, if you learned a perfect weather forecast for the next month and then went into a completely isolated bunker but kept track of what day it was, your beliefs and the actual weather would be very dependent even though there's no causal interaction (after you entered the bunker) between your beliefs and the weather. This is therefore omitted by MBs and CBs want to capture that.

Comment by Mateusz Bagiński (mateusz-baginski) on Generalizing Foundations of Decision Theory · 2024-07-22T11:47:50.659Z · LW · GW

(Continuity.) If , then there exists  such that a gamble  assigning probability  to  and  to  satisfies .

Should be " to "

Comment by Mateusz Bagiński (mateusz-baginski) on Open Thread Summer 2024 · 2024-07-22T08:59:54.604Z · LW · GW

Actually, it might be it, thanks!

Comment by Mateusz Bagiński (mateusz-baginski) on Schelling points in the AGI policy space · 2024-07-21T08:53:13.437Z · LW · GW

The Schelling-point-ness of these memes seems to me to be secondary to (all inter-related):

  • memetic fit (within a certain demographic, conditional on the person/group already adopting certain beliefs/attitudes/norms etc)
  • being a self-correcting/stable attractor in the memespace
  • being easy to communicate and hold all the relevant parts in your mind at once

You discuss all of that but I read the post as saying something like "we need Schelling points, therefore we have to produce memetic attractors to serve as such Schelling points", whereas I think that typically first a memeplex emerges, and then people start coordinating around it without much reflection. (Well, arguably this is true of most Schelling points.)


Here's one more idea that I think I've seen mentioned somewhere and so far hasn't spread but might become a Schelling point

AI summer - AI has brought a lot of possibilities but the road further ahead is fraught with risks. Let's therefore pause fundamental research and focus on reaping the benefits of the state of AI that we already have.

Comment by Mateusz Bagiński (mateusz-baginski) on Open Thread Summer 2024 · 2024-07-21T06:57:24.706Z · LW · GW

I think I saw a LW post that was discussing alternatives to the vNM independence axiom. I also think (low confidence) it was by Rob Bensinger and in response to Scott's geometric rationality (e.g. this post). For the hell of me, I can't find it. Unless my memory is mistaken, does anybody know what I'm talking about?

Comment by Mateusz Bagiński (mateusz-baginski) on How do we know that "good research" is good? (aka "direct evaluation" vs "eigen-evaluation") · 2024-07-19T10:56:35.985Z · LW · GW

ideas of Eigenmorality and Eigenkarma[3].

broken links

Comment by Mateusz Bagiński (mateusz-baginski) on A List of 45+ Mech Interp Project Ideas from Apollo Research’s Interpretability Team · 2024-07-18T14:18:00.695Z · LW · GW

recently[1].

empty footnote

Comment by Mateusz Bagiński (mateusz-baginski) on LLMs as a Planning Overhang · 2024-07-17T07:43:03.831Z · LW · GW

What kind of interpretability work do you consider plausibly useful or at least not counterproductive?

Comment by Mateusz Bagiński (mateusz-baginski) on Corrigibility = Tool-ness? · 2024-07-16T09:51:32.295Z · LW · GW

2. Task uncertainty with reasonable prior on goal drift - the system is unsure about the task it tries to do and seeks human inputs about it. 

“Task uncertainty with reasonable prior…” sounds to me like an overly-specific operationalization, but I think this desideratum is gesturing at visibility/correctability.

To me, "unsure about the task it tries to do" sounds more like applicability to a wide range of problems.

Comment by Mateusz Bagiński (mateusz-baginski) on LLMs as a Planning Overhang · 2024-07-16T04:46:51.095Z · LW · GW

useless or counterproductive things due to missing it.

What kind of work do you think of?

Comment by Mateusz Bagiński (mateusz-baginski) on Alexander Gietelink Oldenziel's Shortform · 2024-07-06T16:21:05.471Z · LW · GW

Formal frameworks considered in isolation can't be wrong. Still, they often come with some claims like "framework F formalizes some intuitive (desirable?) property or specifies the right way to do some X and therefore should be used in such-and-such real-world situations". These can be disputed and I expect that when somebody claims like "{Bayesianism, utilitarianism, classical logic, etc} is wrong", that's what they mean.

Comment by Mateusz Bagiński (mateusz-baginski) on Loving a world you don’t trust · 2024-06-27T09:20:44.902Z · LW · GW

(Vague shower thought, not standing strongly behind it)

Maybe it is the case that most people as individuals "just want frisbee and tea" but once religion (or rather the very broad class of ~"social practices" some subset/projection of which we round up to "religion") evolved and lowered the activation energy of people's hive switch, they became more inclined to appreciate the beauty of Cathedrals and Gregorian chants, etc.

In other words, people's ability to want/appreciate/[see value/beauty in X] depends largely on the social structure they are embedded in, the framework they adopt to make sense of the world etc. (The selection pressures that led to religion didn't entirely reduce to "somebody wanting something", so at least that part is not question-begging [I think].)

Comment by Mateusz Bagiński (mateusz-baginski) on The Data Wall is Important · 2024-06-10T09:29:57.546Z · LW · GW

For good analysis of this, search for the heading “The data wall” here.

Did you mean to insert a link here?

Comment by Mateusz Bagiński (mateusz-baginski) on Announcing ILIAD — Theoretical AI Alignment Conference · 2024-06-06T06:33:48.906Z · LW · GW

Intentional
Lure for
Improvised
Acronym
Derivation

Comment by Mateusz Bagiński (mateusz-baginski) on Less Anti-Dakka · 2024-05-31T18:03:34.007Z · LW · GW

You're right, fixed, thanks!

Comment by Mateusz Bagiński (mateusz-baginski) on Non-Disparagement Canaries for OpenAI · 2024-05-31T09:23:25.123Z · LW · GW

In response, some companies began listing warrant canaries on their websites—sentences stating that they had never yet been forced to reveal any client data. If at some point they did receive such a warrant, they could then remove the canary without violating their legal non-disclosure obligation, thereby allowing the public to gain indirect evidence about this otherwise-invisible surveillance.

 

Can the gov force them not to remove the canary?

Comment by Mateusz Bagiński (mateusz-baginski) on [Linkpost] The Expressive Capacity of State Space Models: A Formal Language Perspective · 2024-05-28T15:18:50.671Z · LW · GW

It wasn't me but it's probably about spreading AI capabilities-relevant knowledge.

Comment by Mateusz Bagiński (mateusz-baginski) on simeon_c's Shortform · 2024-05-24T17:12:49.805Z · LW · GW

He openly stated that he had left OA because he lost confidence that they would manage singularity responsibly. Had he signed the NDA, he would be prohibited from saying that.

Comment by Mateusz Bagiński (mateusz-baginski) on Should we be concerned about eating too much soy? · 2024-05-22T18:39:37.540Z · LW · GW

According to this plant-based-leaning but also somewhat vegan-critical blog led by a sports nutritionist, eating 4 doses of soy per day (120 g of soybeans) is safe for male hormonal balance. It's in Polish but Google translate should handle. He cities studies. https://www.damianparol.com/soja-i-testosteron/

Comment by Mateusz Bagiński (mateusz-baginski) on robo's Shortform · 2024-05-20T14:15:14.791Z · LW · GW

from Eric Weinstein in a youtube video.

Can you link?