Posts

Goodhart Typology via Structure, Function, and Randomness Distributions 2025-03-25T16:01:08.327Z
Bounded AI might be viable 2025-03-06T12:55:46.224Z
Less Anti-Dakka 2024-05-31T09:07:10.450Z
Some Problems with Ordinal Optimization Frame 2024-05-06T05:28:42.736Z
What are the weirdest things a human may want for their own sake? 2024-03-20T11:15:09.791Z
Three Types of Constraints in the Space of Agents 2024-01-15T17:27:27.560Z
'Theories of Values' and 'Theories of Agents': confusions, musings and desiderata 2023-11-15T16:00:48.926Z
Charbel-Raphaël and Lucius discuss interpretability 2023-10-30T05:50:34.589Z
"Wanting" and "liking" 2023-08-30T14:52:04.571Z
GPTs' ability to keep a secret is weirdly prompt-dependent 2023-07-22T12:21:26.175Z
How do you manage your inputs? 2023-03-28T18:26:36.979Z
Mateusz Bagiński's Shortform 2022-12-26T15:16:17.970Z
Kraków, Poland – ACX Meetups Everywhere 2022 2022-08-24T23:07:07.542Z

Comments

Comment by Mateusz Bagiński (mateusz-baginski) on The Dangers of Mirrored Life · 2025-03-26T17:02:46.581Z · LW · GW

LUCA lived around 4 billion years ago with some chirality chosen at random.

Not necessarily: https://en.wikipedia.org/wiki/Homochirality#Deterministic_theories

E.g.

Deterministic mechanisms for the production of non-racemic mixtures from racemic starting materials include: asymmetric physical laws, such as the electroweak interaction (via cosmic rays) or asymmetric environments, such as those caused by circularly polarized light, quartz crystals, or the Earth's rotation, β-Radiolysis or the magnetochiral effect. The most accepted universal deterministic theory is the electroweak interaction. Once established, chirality would be selected for.

Comment by Mateusz Bagiński (mateusz-baginski) on Map of all 40 copyright suits v. AI in U.S. · 2025-03-26T11:10:56.658Z · LW · GW

Especially given how concentrated-sparse it is.

It would be much better to have it as a google sheet.

Comment by Mateusz Bagiński (mateusz-baginski) on Recent AI model progress feels mostly like bullshit · 2025-03-26T08:46:53.302Z · LW · GW

How long do you[1] expect it to take to engineer scaffolding that will make reasoning models useful for the kind of stuff described in the OP?

  1. ^

    You=Ryan firstmost but anybody reading this secondmost.

Comment by Mateusz Bagiński (mateusz-baginski) on Goodhart's Law Causal Diagrams · 2025-03-25T16:04:29.725Z · LW · GW

https://www.lesswrong.com/posts/TYgztDNXhobbqMpXh/goodhart-typology-via-structure-function-and-randomness 

Comment by Mateusz Bagiński (mateusz-baginski) on rhollerith_dot_com's Shortform · 2025-03-25T07:13:53.761Z · LW · GW

My model is that

  1. some of it is politically/ideologically/self-interest-motivated
  2. some of it is just people glancing at a thing, forming an impression, and not caring to investigate further
  3. some of it is people interacting with the thing indirectly via people from the first two categories; some subset of them then take a glance at the PauseAI website or whatever, out of curiosity, form an impression (e.g. whether it matches what they've heard from other people), don't care to investigate further

Making slogans more ~precise might help with (2) and (3)

Comment by Mateusz Bagiński (mateusz-baginski) on rhollerith_dot_com's Shortform · 2025-03-24T20:46:01.974Z · LW · GW

Some people misinterpret/mispaint them(/us?) as "luddites" or "decels" or "anti-AI-in-general" or "anti-progress".

Is it their(/our?) biggest problem, one of their(/our?) bottlenecks? Most likely no.

It might still make sense to make marginal changes that make it marginally harder to do that kind of mispainting / reduce misinterpretative degrees of freedom.

Comment by Mateusz Bagiński (mateusz-baginski) on rhollerith_dot_com's Shortform · 2025-03-24T19:22:07.551Z · LW · GW

You can still include it in your protest banner portfolio to decrease the fraction of people whose first impression is "these people are against AI in general" etc.

Comment by Mateusz Bagiński (mateusz-baginski) on Solving willpower seems easier than solving aging · 2025-03-24T12:06:20.300Z · LW · GW

This closely parallels the situation with the immune system.

One might think "I want a strong immune system. I want to be able to fight every dangerous pathogen I might encounter."

You go to your local friendly genie and ask for a strong immune system.

The genie fulfills your wish. No more seasonal flu. You don't need to bother with vaccines. You even considered stopping to wash your hands but then you realized that other people are still not immune to whatever bugs might there be on your skin.

Then, a few weeks in, you get an anaphylactic shot when eating your favorite peanut butter sandwich. An ambulance takes you to the hospital where they also tell you that you got Hashimoto.

You go to your genie to ask "WTF?" and the genie replies "You asked for a strong immune system, not a smart one. It was not my task to ensure that it knows that peanut protein is not the protein of some obscure worm even though they might look alike, or that the thyroid is a part of your own body.".

Comment by Mateusz Bagiński (mateusz-baginski) on Solving willpower seems easier than solving aging · 2025-03-24T11:53:56.751Z · LW · GW

I have experimented some with meditation specifically with the goal of embracing the DMN (with few definite results)

I'd be curious to hear more details on what you've tried.

Comment by Mateusz Bagiński (mateusz-baginski) on Solving willpower seems easier than solving aging · 2025-03-24T11:46:45.675Z · LW · GW

Relevant previous discussion: https://www.lesswrong.com/posts/XYYyzgyuRH5rFN64K/what-makes-people-intellectually-active 

Comment by Mateusz Bagiński (mateusz-baginski) on Solving willpower seems easier than solving aging · 2025-03-24T11:41:08.851Z · LW · GW

Then the effect would be restricted to people who are trying to control their eating which we would probably have heard of by now.

Comment by Mateusz Bagiński (mateusz-baginski) on Why Were We Wrong About China and AI? A Case Study in Failed Rationality · 2025-03-24T08:55:01.278Z · LW · GW

What is some moderately strong evidence that China (by which I mean Chinese AI labs and/or the CCP) is trying to build AGI, rather than "just": build AI that is useful for whatever they want their AIs to do and not fall behind the West while also not taking the Western claims about AGI/ASI/singularity at face value?

Comment by Mateusz Bagiński (mateusz-baginski) on Why Were We Wrong About China and AI? A Case Study in Failed Rationality · 2025-03-24T08:40:45.775Z · LW · GW

DeepSeek from my perspective should incentivize slowing down development (if you agree with the fast follower dynamic. Also by reducing profit margins generally), and I believe it has.

Any evidence of DeepSeek marginally slowing down AI development?

Comment by Mateusz Bagiński (mateusz-baginski) on Metacognition Broke My Nail-Biting Habit · 2025-03-23T19:14:07.129Z · LW · GW

There's a psychotherapy school called "metacognitive therapy" and some people swear by it being simple and a solution to >50% of psychological problems because it targets the root causes of psychological problems (saying from memory of what was in the podcast that I listened to in the Summer of 2023 and failed to research the topic further; so my description might be off but maybe somebody will find some value in it).

https://podcast.clearerthinking.org/episode/173/pia-callesen-using-metacognitive-therapy-to-break-the-habit-of-rumination/ 

Comment by Mateusz Bagiński (mateusz-baginski) on Dusty Hands and Geo-arbitrage · 2025-03-23T19:06:22.204Z · LW · GW

In the case of engineering humans for increased IQ, Indians show broad support for such technology in surveys (even in the form of rather extreme intelligence enhancement), so one might focus on doing research there and/or lobbying its people and government to fund such research. High-impact Indian citizens interested in this topic seem like very good candidates for funding, especially those with the potential of snowballing internal funding sources that will be insulated from western media bullying.

I've also heard that AI X-risk is much more viral in India than EA in general (in comparative terms, relative to the West).

And in terms of "Anything right-leaning" a parallel EA culture, preferably with a different name, able to cultivate right-wing funding sources might be effective.

Progress studies? Not that they are necessarily right-leaning themselves but if you integrate support for [progress-in-general and doing a science of it] over the intervals of the political spectrum, you might find that center-right-and-righter supports it more than center-left-and-lefter (though low confidence and it might flip if you ignore the degrowth crowd).

Comment by Mateusz Bagiński (mateusz-baginski) on Dusty Hands and Geo-arbitrage · 2025-03-23T18:54:35.860Z · LW · GW

With the exception of avoiding rationalists (and can we really blame Moskovitz for that?)

care to elaborate?

Comment by Mateusz Bagiński (mateusz-baginski) on Solving willpower seems easier than solving aging · 2025-03-23T16:13:30.059Z · LW · GW

Some amphetamines kinda solve akrasia-in-general to some extent (much more so than caffeine), at least for some people.

I'm not claiming that they're worth it.

Comment by Mateusz Bagiński (mateusz-baginski) on Solving willpower seems easier than solving aging · 2025-03-23T16:09:44.193Z · LW · GW

I imagine "throw away your phone" will get me 90% of the way there.

I strongly recommend https://www.minimalistphone.com/ 

It didn't get me 90% of the way there ("there" being "completely eliminating/solving akrasia") but it probably did reduce [spending time on my phone in ways I don't endorse] by at least one order of magnitude.

Comment by Mateusz Bagiński (mateusz-baginski) on Towards a scale-free theory of intelligent agency · 2025-03-23T14:42:17.522Z · LW · GW

Active inference is an extension of predictive coding in which some beliefs are so rigid that, when they conflict with observations, it’s easier to act to change future observations than it is to update those beliefs. We can call these hard-to-change beliefs “goals”, thereby unifying beliefs and goals in a way that EUM doesn’t.

You're probably aware of it but it makes sense to explicitize that this move also puts in the goal category many biases, addictions, and maladaptive/disendorsed behaviors.

EUM treats goals and beliefs as totally separate. But in practice, agents represent both of these in terms of the same underlying concepts. When those concepts change, both beliefs and goals change.

Active inference is one framework that attempts to address it. Jeffrey-Bolker is another one, though I haven't dipped my toes into it deep enough to have an informed opinion on whether it's more promising than active inference for the thing you want to do.

Based on similar reasoning, Scott Garrabrant rejects the independence axiom. He argues that the axiom is unjustified because rational agents should be able to lock in values like fairness based on prior agreements (or even hypothetical agreements).

I first thought that this introduces epistemic instability because vNM EU theory rests on the independence axiom (so it looked like: to unify EU theory with active inference you wanted to reject one of the things defining EU theory qua EU theory) but then I realized that you hadn't assumed vNM as a foundation for EU theory, so maybe it's irrelevant. But still, as far as I remember, different foundations of EU theory give you slightly different implications (and many of them have some equivalent of the independence axiom; at least Savage does), so it might be good for you to think explicitly about what kind of EU foundation you're assuming. But it also might be irrelevant. I don't know. I'm leaving this thought-train-dump in case it might be useful.

Comment by mateusz-baginski on [deleted post] 2025-03-22T18:23:03.070Z

I don't think anything I said implied interest in your thesis. 

I was mostly explaining why the act of pasting your thesis in Spanish to LW was a breach of some implicit norms that I thought were so obvious that they didn't even need to be stated and also trying to understand why you did it (you linked some previous question post but I couldn't find the answer to my question with quick ctrl+f+[keyword] which is what I think is a reasonable amount of effort whenever someone answers a specific, simple question with a link that is not a straightforward answer to this question).

Comment by mateusz-baginski on [deleted post] 2025-03-22T13:37:57.632Z

LW is an English-speaking site. I've never seen a non-English post (or even comment?) and while I don't know if it's explicitly written anywhere, I feel like a long post in Spanish is not aligned with this site's spirit/intention.

If I wanted to share my work that is not in English, I would make a linkpost with a translated abstract and maybe an introduction and link to the pdf in some online repository.

Comment by mateusz-baginski on [deleted post] 2025-03-22T10:47:32.745Z

Why are you posting a post in Spanish on LessWrong?

Tesis presentada como exigencia parcial a la obtención del título de Especialista en Neurociencia Clínica de la Facultad AVM

Did you just copy-paste the PDF of some guy's thesis?

Comment by Mateusz Bagiński (mateusz-baginski) on Rafael Harth's Shortform · 2025-03-21T12:41:56.739Z · LW · GW

I have Ubuntu and I also find myself opening apps mostly by searching. I think the only reason I put anything on desktop is to be reminded that these are the things I'm doing/reading at the moment (?).

Comment by Mateusz Bagiński (mateusz-baginski) on TurnTrout's shortform feed · 2025-03-20T18:47:29.373Z · LW · GW

I've been wondering about this for a while, so I'm just going to be opportunistic and ask here.

How is your current focus on interpy/empirical stuff related to Shard Theory? (I presume there's still some relevant connection, given that you're calling yourselves "Team Shard".)

Comment by Mateusz Bagiński (mateusz-baginski) on An Advent of Thought · 2025-03-20T16:25:44.316Z · LW · GW

Is the infinitude of "how should one think?" the "main reason" why philosophy is infinite? Is it the main reason for most particular infinite philosophical problems being infinite? I would guess that it is not — that there are also other important reasons; in particular, if a philosophical problem is infinite, I would expect there to at least also be some reason for its infinitude which is "more internal to it". In fact, I suspect that clarifying the [reasons why]/[ways in which] endeavors can end up infinite is itself an infinite endeavor :).

 

Attribution is probably very (plausibly infinitely) tricky here, similar to how it's tricky to state why certain facts about mathematical structures are true while others are false. Sure, one can point to a proof of the theorem and say "That's why P is true." but this is conflating the reason for why one knows/believes P versus the reason for why P, especially when there are multiple, very different ways to prove the same proposition.[1] Or at least that's how it feels to me.

  1. ^

    I don't know how often it is the case. It would be fun to see something like a distribution of proof diversity scores for some quasi-representative set of mathematical propositions where by "proof diversity" I mean something like information diameter (Li & Vitányi, Def 8.5.1).

Comment by Mateusz Bagiński (mateusz-baginski) on On MAIM and Superintelligence Strategy · 2025-03-20T11:21:21.324Z · LW · GW

If we end up with MAIM, here is how I think it might work:

If we establish such a thing, such a thing will probably be sufficient for robust global coordination on AI governance, so that we wouldn't need MAIM (or at the very least, we would have a maimed version of MAIM that is continuous with what people talked about before Hendrycks & al introduced the concept of MAIM).

Comment by Mateusz Bagiński (mateusz-baginski) on An Advent of Thought · 2025-03-19T21:24:58.594Z · LW · GW

An idea-image that was bubbling in my mind while I was reading Note 1.

One naive definition of an infinite endeavor would be something like "a combinatorially exploded space of possible constructions/[phenomena to investigate]" where the explosion can stem from some finite set of axioms and inference rules or whatever.

I don't think this is endeavor-infinity in the sense you're talking about here. There's probably a reason you called (were tempted to call?) it an "infinite endeavor", not an "infinite domain". A domain is "just out there". An endeavor is both "out there" and in the agent doing/[participating in] the endeavor. That agent (thinker?) has a taste and applying that taste to that infinite but [possibly in a certain sense finitely specifiable/constrainable] thing is what makes it an endeavor and grants it its infinite character.

Taste creates little pockets of interestingness in combinatorially exploded stuff and those pockets have smaller pockets still; or perhaps a better metaphor would be an infinite landscape, certain parts of which the thinker's taste lights up with salience and once you "conquer" one salient location, you realize salience of other locations because you learned something in that location. Or perhaps you updated your taste, to the extent that these two are distinguishable. Or perhaps the landscape itself updated because you realized that you "just can" not assume Euclid's fifth postulate or tertium non datur or ex contradictione quodlibet and consequently discover a new way to do math: non-Euclidean geometries or new kinds of logic.

If I were to summarize my ad-hoc-y image of endeavor-infinity at the moment, it would be something like:

An infinite endeavor emerges from a thinker imposing/applying some (very likely proleptic) taste/criterion to a domain and then exploring (and continuing to construct?) that domain according to that taste/criterion's guidance; where all three of {thinker, criterion, domain} (have the potential to) grow in the process. 

(which contains a lot of ad-hoc-y load-bearing concepts to be elucidated for sure) 


I only read the first note and loved it. Will surely read the rest.

Comment by Mateusz Bagiński (mateusz-baginski) on Joseph Miller's Shortform · 2025-03-19T16:16:19.085Z · LW · GW

Not OP but IME it might (1) insist that it's right, (2) apologize, think again, generate code again, but it's mostly the same thing (in which case it might claim it fixed something or it might not), (3) apologize, think again, generate code again, and it's not mostly the same thing.

Comment by Mateusz Bagiński (mateusz-baginski) on Open Thread Spring 2025 · 2025-03-19T09:42:24.382Z · LW · GW

I have an impression that there's been a recent increase in the number of users inactivating/deleting their LW accounts. As I say, it's just an impression, no stats or anything, so I'm wondering whether that's the case and if that's the case, what might be the causes (assuming it's not a statistical fluke).

Comment by Mateusz Bagiński (mateusz-baginski) on I changed my mind about orca intelligence · 2025-03-18T10:32:52.771Z · LW · GW

TLDR: I now think it’s <1% likely that average orcas are >=+6std intelligent.

 

I suggest explaining in the TLDR what do you mean by ">=+6std intelligent". +6std with respect to what intelligence? Human intelligence!? If so, please provide more context, as this sounds quite unlikely.

Comment by Mateusz Bagiński (mateusz-baginski) on TsviBT's Shortform · 2025-03-17T16:09:15.603Z · LW · GW

On average, people get less criminal as they get older, so that would point towards human kindness increasing in time. On the other hand, they also get less idealistic, on average, so maybe a simpler explanation is that as people get older, they get less active in general.

When I read Tsvi's OP, I was imagining something like a (trans-/post- but not too post-)human civilization where everybody by default has an unbounded lifespan and healthspan, possibly somewhat boosted intelligence and need for cognition / open intellectual curiosity. (In which case, "people tend to X as they get older", where X is something mostly due to things related to default human aging, doesn't apply.)

Now start it as a modern-ish democracy or a cluster of (mostly) democracies, run for 1e4 to 1e6 years, and see what happens.

Comment by Mateusz Bagiński (mateusz-baginski) on TsviBT's Shortform · 2025-03-17T15:55:24.725Z · LW · GW

Past this point, you're likely never returning to bothering about them. Why would you, if you can instead generate entire worlds of the kinds of people/entities/experiences you prefer? It seems incredibly unlikely that human social instincts can only be satisfied – or even can be best satisfied – by other humans.

For the same reason that most people (if given the power to do so) wouldn't just replace their loved ones with their altered versions that are better along whatever dimensions the person judged them as deficient/imperfect.

Comment by Mateusz Bagiński (mateusz-baginski) on TsviBT's Shortform · 2025-03-17T15:53:08.247Z · LW · GW

Source: I have meta-preferences to freeze some of my object-level values at "eudaimonia", and I take specific deliberate actions to avoid or refuse value-drift on that.

I'm curious to hear more about those specific deliberate actions.

Comment by Mateusz Bagiński (mateusz-baginski) on Joseph Miller's Shortform · 2025-03-17T07:54:16.513Z · LW · GW

Naive idea:

Get an LLM to generate a TLDR of the post and after the user finishes reading the post, have a pop-up "Was opening the post worth it, given that you've already read the TLDR?".

Comment by Mateusz Bagiński (mateusz-baginski) on Michael Dickens' Caffeine Tolerance Research · 2025-03-16T07:35:31.458Z · LW · GW

Well, ok this doesn't explain why his reaction time without caffeine also improved (and even more so than with caffeine) but perhaps this could be explained by something like: caffeine increases the efficiency of some circuits, reaction time tests/exercises sculpt those circuits and sculpt them even more when on caffeine and some of that sculpting persists even when not on caffeine. (speculating ofc)

 

Comment by Mateusz Bagiński (mateusz-baginski) on Michael Dickens' Caffeine Tolerance Research · 2025-03-16T07:31:10.210Z · LW · GW

This outcome is statistically significant (p = 0.016), but the data show a weird pattern: caffeine’s effectiveness went up over time instead of staying flat. I don’t know how to explain that, which makes me suspicious of the experiment’s findings.

Something something you adapt to a ~constant effect of the substance, thus learning to leverage it better?

Comment by Mateusz Bagiński (mateusz-baginski) on Coffee: When it helps, when it hurts · 2025-03-16T07:29:41.530Z · LW · GW

Rodents have a much higher tolerance for many drugs than humans (I think e.g. 40× more tolerant of ethanol).

Comment by Mateusz Bagiński (mateusz-baginski) on MakoYass's Shortform · 2025-03-16T07:19:20.936Z · LW · GW

Were OpenAI also, in theory, able to release sooner than they did, though?

Smaller issue but OA did sit on GPT-2 for a few months between publishing the paper and open-sourcing it, apparently due to safety considerations.

Comment by Mateusz Bagiński (mateusz-baginski) on TsviBT's Shortform · 2025-03-16T07:08:23.239Z · LW · GW

A particularly annoying-to-me kind of discourse wormhole:

Alice starts arguing and the natural interpretation of her argument is that she's arguing for claim X. As the discussion continues and evidence/arguments[1] amass against X, she nimbly switches to arguing for an adjacent claim Y, pretending that Y is what she's been arguing for all along (which might even go unnoticed by her interlocutors).

  1. ^

    Or even, eh, social pressures, etc.

Comment by Mateusz Bagiński (mateusz-baginski) on plex's Shortform · 2025-03-16T06:59:14.584Z · LW · GW

but I put very low odds on being in this world

how low?

Comment by Mateusz Bagiński (mateusz-baginski) on Mateusz Bagiński's Shortform · 2025-03-12T17:56:08.426Z · LW · GW

beyond doom and gloom - towards a comprehensive parametrization of beliefs about AI x-risk

doom - what is the probability of AI-caused X-catastrophe (i.e. p(doom))?

gloom - how viable is p(doom) reduction?

foom - how likely is RSI?

loom - are we seeing any signs of AGI soon, looming on the horizon?

boom - if humanity goes extinct, how fast will it be?

room - if AI takeover happens, will AI(s) leave us a sliver of the light cone?

zoom - how viable is increasing our resolution on AI x-risk?

Comment by mateusz-baginski on [deleted post] 2025-03-11T12:35:22.717Z

redundant/[should be merged with] https://www.lesswrong.com/w/ai-racing 

Comment by mateusz-baginski on [deleted post] 2025-03-11T12:34:47.945Z

redundant/[should be merged with] https://www.lesswrong.com/w/ai-arms-race? 

Comment by Mateusz Bagiński (mateusz-baginski) on AISN #49: Superintelligence Strategy · 2025-03-10T11:01:56.672Z · LW · GW

I disagree with framing these results in terms of "dishonesty" or "intentional deception".

Or, at least, it's severely under-argued that this framing is more accurate than "more capable models producing more accurate statements by default and also more capable of taking on any role than you imply in the prompt".

Comment by Mateusz Bagiński (mateusz-baginski) on johnswentworth's Shortform · 2025-03-10T10:43:40.462Z · LW · GW

I don't think I implied that John's post implied that and I don't think going into the woods non-indefinitely mitigates the thing I pointed out.

Comment by Mateusz Bagiński (mateusz-baginski) on johnswentworth's Shortform · 2025-03-10T06:29:49.493Z · LW · GW

solution 2 implies that a smart person with a strong technical background would go on to work on important problems (by default) which is not necessarily universally true and it's IMO likely that many such people would be working on less important things than what their social circle is otherwise steering them to work on

Comment by Mateusz Bagiński (mateusz-baginski) on Statistical Challenges with Making Super IQ babies · 2025-03-05T17:14:44.077Z · LW · GW

doesn't feel like an obviously impossible notion that very high IQs might have had a negative effect on fertility in that time as well.

or that some IQ-increasing variants affect stuff other than intelligence in ways that are disadvantageous/fitness-decreasing in some contexts

Comment by Mateusz Bagiński (mateusz-baginski) on A Bear Case: My Predictions Regarding AI Progress · 2025-03-05T17:10:18.616Z · LW · GW

Thanks!

At some unknown point – probably in 2030s

why do you think it's probably 2030s?

Comment by Mateusz Bagiński (mateusz-baginski) on faul_sname's Shortform · 2025-03-05T16:52:20.394Z · LW · GW

Alright, fair, I misread the definition of "homeostatic agents".

Comment by Mateusz Bagiński (mateusz-baginski) on faul_sname's Shortform · 2025-03-04T16:00:37.667Z · LW · GW

I interpreted "unbounded" as "aiming to maximize expected value of whatever", not "unbounded in the sense of bounded rationality".