Posts

Less Anti-Dakka 2024-05-31T09:07:10.450Z
Some Problems with Ordinal Optimization Frame 2024-05-06T05:28:42.736Z
What are the weirdest things a human may want for their own sake? 2024-03-20T11:15:09.791Z
Three Types of Constraints in the Space of Agents 2024-01-15T17:27:27.560Z
'Theories of Values' and 'Theories of Agents': confusions, musings and desiderata 2023-11-15T16:00:48.926Z
Charbel-Raphaël and Lucius discuss interpretability 2023-10-30T05:50:34.589Z
"Wanting" and "liking" 2023-08-30T14:52:04.571Z
GPTs' ability to keep a secret is weirdly prompt-dependent 2023-07-22T12:21:26.175Z
How do you manage your inputs? 2023-03-28T18:26:36.979Z
Mateusz Bagiński's Shortform 2022-12-26T15:16:17.970Z
Kraków, Poland – ACX Meetups Everywhere 2022 2022-08-24T23:07:07.542Z

Comments

Comment by Mateusz Bagiński (mateusz-baginski) on o3 · 2024-12-21T09:42:20.321Z · LW · GW

I guess one's timelines might have gotten longer if one had very high credence that the paradigm opened by o1 is a blind alley (relative to the goal of developing human-worker-omni-replacement-capable AI) but profitable enough that OA gets distracted from its official most ambitious goal.

I'm not that person.

Comment by Mateusz Bagiński (mateusz-baginski) on Open Thread Fall 2024 · 2024-12-18T11:02:18.499Z · LW · GW

React suggestion/request: "not joint-carving"/"not the best way to think about this topic".

This is kind of "(local) taboo those words" but it's more specific.

Comment by Mateusz Bagiński (mateusz-baginski) on Hierarchical Agency: A Missing Piece in AI Alignment · 2024-12-13T14:29:42.891Z · LW · GW

Compositional agency?

Comment by Mateusz Bagiński (mateusz-baginski) on Secular interpretations of core perennialist claims · 2024-12-09T19:27:33.200Z · LW · GW

terminal values in the first place, as opposed to active blind spots masquerading as terminal values.

Can't one's terminal values be exactly (mechanistically implemented as) active blind spots?

I predict that you would say something like "The difference is that active blind spots can be removed/healed/refactored 'just' by (some kind of) learning, so they're not unchanging as one's terminal values would be assumed to be."?

Comment by Mateusz Bagiński (mateusz-baginski) on Sapphire Shorts · 2024-12-07T09:26:45.160Z · LW · GW

cover ups

Why do you think there are cover-ups?

More specifically, do you mean that people-in-the-know are not willing to report it or that there is some active silencing or [discouragement of those who would like to bring attention to it] going on?

There was one community alert about Zizians 2y ago here. Before that, there was a discussion of Jessica Taylor's situation being downstream from Vassar's influence but as far as I remember Scott Alexander eventually retracted his claims about this.

In any case, I think this kind of stuff deserves a top-level alert post, like the one about Ziz.

Comment by Mateusz Bagiński (mateusz-baginski) on Alexander Gietelink Oldenziel's Shortform · 2024-12-05T14:56:55.574Z · LW · GW

Also: anybody have any recommendations for pundits/analysis sources to follow on the Taiwan situation? (there's Sentinel but I'd like something more in-depth and specifically Taiwan-related)

Comment by Mateusz Bagiński (mateusz-baginski) on mesaoptimizer's Shortform · 2024-12-01T17:11:12.828Z · LW · GW

I think Mesa is saying something like "The missing pieces are too alien for us to expect to discover them by thinking/theorizing but we'll brute-force the AI into finding/growing those missing pieces by dumping more compute into it anyway." and Tsvi's koan post is meant to illustrate how difficult it would be to think oneself into those missing pieces.

Comment by Mateusz Bagiński (mateusz-baginski) on (The) Lightcone is nothing without its people: LW + Lighthaven's big fundraiser · 2024-12-01T07:00:03.227Z · LW · GW

Estonia. (Alternatively, Poland, in which case: PLN, not EUR.)

Comment by Mateusz Bagiński (mateusz-baginski) on (The) Lightcone is nothing without its people: LW + Lighthaven's big fundraiser · 2024-12-01T05:47:55.847Z · LW · GW

I'm considering donating. Any chance of setting up some tax deduction for Euros?

Comment by Mateusz Bagiński (mateusz-baginski) on "The Solomonoff Prior is Malign" is a special case of a simpler argument · 2024-11-25T18:02:29.447Z · LW · GW

I think you meant to hide these two sentences in spoiler tags but you didn't

Comment by Mateusz Bagiński (mateusz-baginski) on A few questions about recent developments in EA · 2024-11-23T04:51:33.789Z · LW · GW

guilt-by-association

Not necessarily guilt-by-association, but maybe rather pointing out that the two arguments/conspiracy theories share a similar flawed structure, so if you discredit one, you should discredit the other.

Still, I'm also unsure how much structure they share, and even if they did, I don't think this would be discursively effective because I don't think most people care that much about (that kind of) consistency (happy to be updated in the direction of most people caring about it).

Comment by Mateusz Bagiński (mateusz-baginski) on Which things were you surprised to learn are not metaphors? · 2024-11-22T04:31:39.401Z · LW · GW

Reminds me of how a few years ago I realized that I don't feel some forms of stress but can infer I'm stressed by noticing reduction in my nonverbal communication.

Comment by Mateusz Bagiński (mateusz-baginski) on DeepSeek beats o1-preview on math, ties on coding; will release weights · 2024-11-21T10:35:06.743Z · LW · GW

FYI if you want to use o1-like reasoning, you need to check off "Deep Think".

 

Comment by Mateusz Bagiński (mateusz-baginski) on DeepSeek beats o1-preview on math, ties on coding; will release weights · 2024-11-21T09:41:01.793Z · LW · GW

It's predictably censored on CCP-sensitive topics.

 

(In a different chat.) After the second question, it typed two lines (something like "There have been several attempts to compare Winnie the Pooh to a public individual...") and then overwrote it with "Sorry...".

Comment by Mateusz Bagiński (mateusz-baginski) on What are the good rationality films? · 2024-11-21T02:55:26.582Z · LW · GW

Astronaut.io

Comment by Mateusz Bagiński (mateusz-baginski) on Why not tool AI? · 2024-11-19T12:33:17.725Z · LW · GW

https://gwern.net/doc/existential-risk/2011-05-10-givewell-holdenkarnofskyjaantallinn.doc

Comment by Mateusz Bagiński (mateusz-baginski) on AI Craftsmanship · 2024-11-13T14:46:54.613Z · LW · GW

glitch tokens are my favorite example

Comment by Mateusz Bagiński (mateusz-baginski) on Pitfalls of the agent model · 2024-11-10T13:59:43.126Z · LW · GW

I directionally agree with the core argument of this post.

The elephant(s) in the room according to me:

  • What is an algorithm? (inb4 a physical process that can be interpreted/modeled as implementing computation)
  • How do you distinguish (hopefully, in a principled way) between (a) an algorithm changing; (b) you being confused about what algorithm the thing is actually running and in reality being more nuanced so that what "naively" seems like a change of the algorithm is "actually" a reparametrization of the algorithm?

I haven't read the examples in this post super carefully, so perhaps you discuss this somewhere in the examples (though I don't think so because the examples don't seem to me like the place to include such discussion).

Comment by Mateusz Bagiński (mateusz-baginski) on Wittgenstein and ML — parameters vs architecture · 2024-11-09T14:28:14.647Z · LW · GW

Thanks for the post! I expected some mumbo jumbo but it turned out to be an interesting intuition pump.

Comment by Mateusz Bagiński (mateusz-baginski) on Aligning AI by optimizing for "wisdom" · 2024-10-15T09:36:52.188Z · LW · GW

Based on my attending Oliver's talk, this may be relevant/useful:

Comment by Mateusz Bagiński (mateusz-baginski) on Rationalism before the Sequences · 2024-10-09T15:57:16.423Z · LW · GW

https://en.wikipedia.org/wiki/X_Club

Comment by Mateusz Bagiński (mateusz-baginski) on A Narrow Path: a plan to deal with AI extinction risk · 2024-10-08T13:55:34.866Z · LW · GW

I too have reservations about points 1 and 3 but not providing sufficient references or justifications doesn't imply they're not on SL1.

Comment by Mateusz Bagiński (mateusz-baginski) on Overview of strong human intelligence amplification methods · 2024-10-08T12:10:33.443Z · LW · GW

mentioned in the FAQ

Comment by Mateusz Bagiński (mateusz-baginski) on Consciousness As Recursive Reflections · 2024-10-06T02:43:00.425Z · LW · GW

(I see what podcasts you listen to.)

Comment by Mateusz Bagiński (mateusz-baginski) on Shortform · 2024-09-30T17:37:54.877Z · LW · GW

My notion of progress is roughly: something that is either a building block for The Theory (i.e. marginally advancing our understanding) or a component of some solution/intervention/whatever that can be used to move probability mass from bad futures to good futures.

Re the three you pointed out, simulators I consider a useful insight, gradient hacking probably not (10% < p < 20%), and activation vectors I put in the same bin as RLHF whatever is the appropriate label for that bin.

Comment by Mateusz Bagiński (mateusz-baginski) on Shortform · 2024-09-30T13:28:57.540Z · LW · GW

Also, I'm curious what it is that you consider(ed) AI safety progress/innovation. Can you give a few representative examples?

Comment by Mateusz Bagiński (mateusz-baginski) on Shortform · 2024-09-30T09:56:16.742Z · LW · GW
  • the approaches that have been attracting the most attention and funding are dead ends
Comment by Mateusz Bagiński (mateusz-baginski) on Ruby's Quick Takes · 2024-09-28T16:49:36.620Z · LW · GW

I'd love to try it, mainly thinking about research (agent foundations and AI safety macrostrategy).

Comment by Mateusz Bagiński (mateusz-baginski) on peterbarnett's Shortform · 2024-09-27T10:26:42.945Z · LW · GW

I propose "token surprise" (as in type-token distinction). You expected this general type of thing but not that Ivanka would be one of the tokens instantiating it.

Comment by Mateusz Bagiński (mateusz-baginski) on Reformative Hypocrisy, and Paying Close Enough Attention to Selectively Reward It. · 2024-09-20T10:58:07.178Z · LW · GW

It's better but still not quite. When you play on two levels, sometimes the best strategy involves a pair of (level 1 and 2) substrategies that are seemingly opposites of each other. I don't think there's anything hypocritical about that.

Similarly, hedging is not hypocrisy.

Comment by Mateusz Bagiński (mateusz-baginski) on Reformative Hypocrisy, and Paying Close Enough Attention to Selectively Reward It. · 2024-09-11T06:21:23.700Z · LW · GW

Do you think [playing in a rat race because it's the most locally optimal for an individual thing to do while at the same advocating for abolishing the rat race] is an example of reformative hypocrisy?

Or even more broadly, defecting in a prisoner's dilemma while exposing an interface that would allow cooperation with other like-minded players?

I've had this concept for many years and it hasn't occurred to me to give it a name (How Stupid Not To Have Thought Of That) but if I tried to give it a name, I definitely wouldn't call it a kind of hypocrisy.

Comment by Mateusz Bagiński (mateusz-baginski) on tailcalled's Shortform · 2024-09-10T01:24:13.780Z · LW · GW

It's not clear to me how this results from "excess resources for no reasons". I guess the "for no reasons" part is crucial here?

Comment by Mateusz Bagiński (mateusz-baginski) on The Information: OpenAI shows 'Strawberry' to feds, races to launch it · 2024-09-05T19:44:06.331Z · LW · GW

I meant this strawberry problem.

Comment by Mateusz Bagiński (mateusz-baginski) on A bet for Samo Burja · 2024-09-05T16:10:05.513Z · LW · GW

Samo said that he would bet that AGI is coming perhaps in the next 20-50 years, but in the next 5.

I haven't listened to the pod yet but I guess you meant "but not in the next 5".

Comment by Mateusz Bagiński (mateusz-baginski) on Richard Ngo's Shortform · 2024-09-04T00:07:01.039Z · LW · GW

FWIW Oliver's presentation of (some fragment of) his work at ILIAD was my favorite of all the talks I attended at the conference.

Comment by Mateusz Bagiński (mateusz-baginski) on The Cognitive-Theoretic Model of the Universe: A Partial Summary and Review · 2024-08-28T13:41:09.080Z · LW · GW

I am not totally sure why he considers discrete models to be unable to describe initial states or state-transition programming.

AFAIU, he considers them inadequate because they rely on an external interpreter, whereas the model of reality should be self-interpreting because there is nothing outside of reality to interpret it.

Wheeler suggests some principles for constructing a satisfactory explanation. The first is that "The boundary of a boundary is zero": this is an algebraic topology theorem showing that, when taking a 3d shape, and then taking its 2d boundary, the boundary of the 2d boundary is nothing, when constructing the boundaries in a consistent fashion that produces cancellation; this may somehow be a metaphor for ex nihilo creation (but I'm not sure how).

See this as an operation that takes a shape and produces its boundary. It goes 3D shape -> 2D shape -> nothing. If you reverse the arrows you get nothing -> 2D shape -> 3D. (Of course, it's not quite right because (IIUC) all 2D shapes have boundary zero but I guess it's just meant as a rough analogy.)

He notes a close relationship between logic, cognition, and perception: for example, "X | !X" when applied to perception states that something and its absence can't both be perceived at once

This usage of logical operators is confusing. In the context of perception, he seems to want to talk about NAND: you never perceive both something and its absence but you may also not perceive either. 

(note that "X | !X" is equivalent to "!(X & !X)" in classical but not intuitionistic logic)

Intuitionistic logic doesn't allow  either.[1] It allows .

Langan contrasts between spatial duality principles ("one transposing spatial relations and objects" and temporal duality principles ("one transposing objects or spatial relations with mappings, functions, operations or processes"). This is now beyond my own understanding.

It's probably something like: if you have a spatial relationship between two objects X and Y, you can view it as an object with X and Y as endpoints. Temporally, if X causes Y, then you can see it as a function/process that, upon taking X produces Y.


The most confusing/unsatisfying thing for me about CTMU (to the extent that I've engaged with it so far) is that it doesn't clarify what "language" is. It points ostensively at examples: formal languages, natural languages, science, perception/cognition, which apparently share some similarities but what are those similarities?

  1. ^

    Though paraconsistent logic does.

Comment by Mateusz Bagiński (mateusz-baginski) on The Information: OpenAI shows 'Strawberry' to feds, races to launch it · 2024-08-28T03:38:07.127Z · LW · GW

Did they name it after the strawberry problem!?

Comment by Mateusz Bagiński (mateusz-baginski) on Wei Dai's Shortform · 2024-08-26T07:41:49.998Z · LW · GW

Here are some axes along which I think there's some group membership signaling in philosophy (IDK about the extent and it's hard to disentangle it from other stuff):

  • Math: platonism/intuitionism/computationalism (i.e. what is math?), interpretations of probability, foundations of math (set theory vs univalent foundations)
  • Mind: externalism/internalism (about whatever), consciousness (de-facto-dualisms (e.g. Chalmers) vs reductive realism vs illusionism), language of thought vs 4E cognition, determinism vs compatibilism vs voluntarism
  • Metaphysics/ontology: are chairs, minds, and galaxies real? (this is somewhat value-laden for many people)
  • Biology: gene's-eye-view/modern synthesis vs extended evolutionary synthesis
Comment by Mateusz Bagiński (mateusz-baginski) on A Case for the Least Forgiving Take On Alignment · 2024-08-25T12:57:44.012Z · LW · GW

Moreover, I don't think that some extra/different planning machinery was required for language itself, beyond the existing abstraction and model-based RL capabilities that many other animals share.

I would expect to see sophisticated ape/early-hominid-lvl culture in many more species if that was the case. For some reason humans went on the culture RSI trajectory whereas other animals didn't. Plausibly there was some seed cognitive ability (plus some other contextual enablers) that allowed a gene-culture "coevolution" cycle to start.

Comment by Mateusz Bagiński (mateusz-baginski) on Open Thread Summer 2024 · 2024-08-25T10:02:50.964Z · LW · GW

My feedback is that I absolutely love it. My favorite feature released since reactions or audio for all posts (whichever was later).

Comment by Mateusz Bagiński (mateusz-baginski) on Extended Interview with Zhukeepa on Religion · 2024-08-24T16:54:19.324Z · LW · GW

In other words, there's a question about how to think about truth in a way that honors perspectivalism, while also not devolving into relativism. And the way Jordan and I were thinking about this, was to have each filter bubble -- with their own standards of judgment for what's true and what's good -- to be fed the best content from the other filter bubbles by the standards from within each filter bubble, rather than the worst content, which is more like what we see with social media today.

 

Seems like Monica Anderson was trying to do something like that with BubbleCity. (pdf, podcast)

Comment by Mateusz Bagiński (mateusz-baginski) on What are the best resources for building gears-level models of how governments actually work? · 2024-08-19T17:23:27.543Z · LW · GW

This is not quite an answer to your question but some recommendations in the comments to this post may be relevant: https://www.lesswrong.com/posts/SCs4KpcShb23hcTni/ideal-governance-for-companies-countries-and-more

Comment by Mateusz Bagiński (mateusz-baginski) on Alex_Altair's Shortform · 2024-08-18T16:40:49.367Z · LW · GW

It does for me

Comment by Mateusz Bagiński (mateusz-baginski) on Raemon's Shortform · 2024-08-17T07:42:48.993Z · LW · GW

Sometimes I look up a tag/concept to ensure that I'm not spouting nonsense about it.

But most often I use them to find the posts related to a topic I'm interested in.

Comment by Mateusz Bagiński (mateusz-baginski) on Self-Other Overlap: A Neglected Approach to AI Alignment · 2024-08-13T19:41:55.076Z · LW · GW

Possible problems with this approach

 

One failure mode you seem to have missed (which I'm surprised by) is that the SOO metric may be good-heartable. It might be the case (for all we know) that by getting the model to maximize SOO (subject to constraints of sufficiently preserving performance etc), you incentivize it to encode the self-other distinction in some convoluted way that is not adequately captured by the SOO metric, but is sufficient for deception.

Comment by Mateusz Bagiński (mateusz-baginski) on Leaving MIRI, Seeking Funding · 2024-08-08T19:00:11.638Z · LW · GW

Radical probabilist

Paperclip minimizer

Child of LDT

Dragon logician

Embedded agent

Hufflepuff cynic

Logical inductor

Bayesian tyrant

Comment by Mateusz Bagiński (mateusz-baginski) on Alexander Gietelink Oldenziel's Shortform · 2024-08-07T13:09:13.117Z · LW · GW

Asexual species universally seem to have come into being very recently. They likely go extinct due to lack of genetic diversity and attendant mutational load catastrophe and/or losing arms races with parasites.

Bdelloidea are an interesting counterexample: they evolved obligate parthenogenesis ~25 mya.

Comment by Mateusz Bagiński (mateusz-baginski) on Near-mode thinking on AI · 2024-08-06T07:31:17.724Z · LW · GW

There's a famous prediction market about whether AI will get gold from the International Mathematical Olympiad by 2025.

correction: it's by the end of 2025

Comment by Mateusz Bagiński (mateusz-baginski) on Zach Stein-Perlman's Shortform · 2024-08-04T16:12:50.177Z · LW · GW

Also, they failed to provide the promised fraction of compute to the Superalignment team (and not because it was needed for non-Superalignment safety stuff).

Comment by Mateusz Bagiński (mateusz-baginski) on Alexander Gietelink Oldenziel's Shortform · 2024-07-30T10:36:06.442Z · LW · GW

Well, past events--before some time t--kind of obviously can't be included in the Markov blanket at time t.

As far as I understand it, the MB formalism captures only momentary causal interactions between "Inside" and "Outside" but doesn't capture a kind of synchronicity/fine-tuning-ish statistical dependency that doesn't manifest in the current causal interactions (across the Markov blanket) but is caused by past interactions.

For example, if you learned a perfect weather forecast for the next month and then went into a completely isolated bunker but kept track of what day it was, your beliefs and the actual weather would be very dependent even though there's no causal interaction (after you entered the bunker) between your beliefs and the weather. This is therefore omitted by MBs and CBs want to capture that.