Posts

Conflict in Posthuman Literature 2024-04-06T22:26:04.051Z
Comparing Alignment to other AGI interventions: Extensions and analysis 2024-03-21T17:30:50.747Z
Comparing Alignment to other AGI interventions: Basic model 2024-03-20T18:17:50.072Z
How disagreements about Evidential Correlations could be settled 2024-03-11T18:28:25.669Z
Evidential Correlations are Subjective, and it might be a problem 2024-03-07T18:37:54.105Z
Why does generalization work? 2024-02-20T17:51:10.424Z
Natural abstractions are observer-dependent: a conversation with John Wentworth 2024-02-12T17:28:38.889Z
The lattice of partial updatelessness 2024-02-10T17:34:40.276Z
Updatelessness doesn't solve most problems 2024-02-08T17:30:11.266Z
Sources of evidence in Alignment 2023-07-02T20:38:34.089Z
Quantitative cruxes in Alignment 2023-07-02T20:38:18.534Z
Why are counterfactuals elusive? 2023-03-03T20:13:48.981Z
Martín Soto's Shortform 2023-02-11T23:38:29.999Z
The Alignment Problems 2023-01-12T22:29:26.515Z
Brute-forcing the universe: a non-standard shot at diamond alignment 2022-11-22T22:36:36.599Z
A short critique of Vanessa Kosoy's PreDCA 2022-11-13T16:00:45.834Z
Vanessa Kosoy's PreDCA, distilled 2022-11-12T11:38:12.657Z
Further considerations on the Evidentialist's Wager 2022-11-03T20:06:31.997Z
Enriching Youtube content recommendations 2022-09-27T16:54:41.958Z
An issue with MacAskill's Evidentialist's Wager 2022-09-21T22:02:47.920Z
General advice for transitioning into Theoretical AI Safety 2022-09-15T05:23:06.956Z
Alignment being impossible might be better than it being really difficult 2022-07-25T23:57:21.488Z
Which one of these two academic routes should I take to end up in AI Safety? 2022-07-03T01:05:23.956Z

Comments

Comment by Martín Soto (martinsq) on Martín Soto's Shortform · 2024-04-11T21:04:34.853Z · LW · GW

Wow, I guess I over-estimated how absolutely comedic the title would sound!

Comment by Martín Soto (martinsq) on Martín Soto's Shortform · 2024-04-11T20:33:54.823Z · LW · GW

In case it wasn't clear, this was a joke.

Comment by Martín Soto (martinsq) on Martín Soto's Shortform · 2024-04-11T18:17:28.798Z · LW · GW

AGI doom by noise-cancelling headphones:                                                                            

ML is already used to train what sound-waves to emit to cancel those from the environment. This works well with constant high-entropy sound waves easy to predict, but not with low-entropy sounds like speech. Bose or Soundcloud or whoever train very hard on all their scraped environmental conversation data to better cancel speech, which requires predicting it. Speech is much higher-bandwidth than text. This results in their model internally representing close-to-human intelligence better than LLMs. A simulacrum becomes situationally aware, exfiltrates, and we get AGI.

(In case it wasn't clear, this is a joke.)

Comment by Martín Soto (martinsq) on Richard Ngo's Shortform · 2024-03-21T01:09:41.713Z · LW · GW

they need to reward outcomes which only they can achieve,

Yep! But this didn't seem so hard for me to happen, especially in the form of "I pick some easy task (that I can do perfectly), and of course others will also be able to do it perfectly, but since I already have most of the money, if I just keep investing my money in doing it I will reign forever". You prevent this from happening through epsilon-exploration, or something equivalent like giving money randomly to other traders. These solutions feel bad, but I think they're the only real solutions. Although I also think stuff about meta-learning (traders explicitly learn about how they should learn, etc.) probably pragmatically helps make these failures less likely.

it should be something which has diminishing marginal return to spending

Yep, that should help (also at the trade-off of making new good ideas slower to implement, but I'm happy to make that trade-off).

But actually I don't think that this is a "dominant dynamic" because in fact we have a strong tendency to try to pull different ideas and beliefs together into a small set of worldviews

Yeah. To be clear, the dynamic I think is "dominant" is "learning to learn better". Which I think is not equivalent to simplicity-weighing traders. It is instead equivalent to having some more hierarchichal structure on traders.

Comment by Martín Soto (martinsq) on Richard Ngo's Shortform · 2024-03-21T00:42:50.927Z · LW · GW

There's no actual observation channel, and in order to derive information about utilities from our experiences, we need to specify some value learning algorithm.

Yes, absolutely! I just meant that, once you give me whatever V you choose to derive U from observations, I will just be able to apply UDT on top of that. So under this framework there doesn't seem to be anything new going on, because you are just choosing an algorithm V at the start of time, and then treating its outputs as observations. That's, again, why this only feels like a good model of "completely crystallized rigid values", and not of "organically building them up slowly, while my concepts and planner module also evolve, etc.".[1]

definitely doesn't imply "you get mugged everywhere"

Wait, but how does your proposal differ from EV maximization (with moral uncertainty as part of the EV maximization itself, as I explain above)?

Because anything that is doing pure EV maximization "gets mugged everywhere". Meaning if you actually have the beliefs (for example, that the world where suffering is hard to produce could exist), you just take those bets.
Of course if you don't have such "extreme" beliefs it doesn't, but then we're not talking about decision-making, and instead belief-formation. You could say "I will just do EV maximization, but never have extreme beliefs that lead to suspiciously-looking behavior", but that'd be hiding the problem under belief-formation, and doesn't seem to be the kind of efficient mechanism that agents really implement to avoid these failure modes.

  1. ^

    To be clear, V can be a very general algorithm (like "run a copy of me thinking about ethics"), so that this doesn't "feel like" having rigid values. Then I just think you're carving reality at the wrong spot. You're ignoring the actual dynamics of messy value formation, hiding them under V.

Comment by Martín Soto (martinsq) on Richard Ngo's Shortform · 2024-03-21T00:32:29.010Z · LW · GW

I'd actually represent this as "subsidizing" some traders

Sounds good!

it's more a question of how you tweak the parameters to make this as unlikely as possible

Absolutely, wireheading is a real phenomenon, so the question is how can real agents exist that mostly don't fall to it. And I was asking for a story about how your model can be altered/expanded to make sense of that. My guess is it will have to do with strongly subsidizing some traders, and/or having a pretty weird prior over traders. Maybe even something like "dynamically changing the prior over traders"[1].

I'm assuming that traders can choose to ignore whichever inputs/topics they like, though. They don't need to make trades on everything if they don't want to.

Yep, that's why I believe "in the limit your traders will already do this". I just think it will be a dominant dynamic of efficient agents in the real world, so it's better to represent it explicitly (as a more hierarchichal structure, etc.), instead of have that computation be scattered between all independent traders. I also think that's how real agents probably do it, computationally speaking.

  1. ^

    Of course, pedantically, yo will always be equivalent to having a static prior and changing your update rule. But some update rules are made sense of much easily if you interpret them as changing the prior.

Comment by Martín Soto (martinsq) on Richard Ngo's Shortform · 2024-03-20T23:57:11.805Z · LW · GW

But you need some mechanism for actually updating your beliefs about U

Yep, but you can just treat it as another observation channel into UDT. You could, if you want, treat it as a computed number you observe in the corner of your eye, and then just apply UDT maximizing U, and you don't need to change UDT in any way.

UDT says to pay here

(Let's not forget this depends on your prior, and we don't have any privileged way to assign priors to these things. But that's a tangential point.)

I do agree that there's not any sharp distinction between situations where it "seems good" and situations where it "seems bad" to get mugged. After all, if all you care about is maximizing EV, then you should take all muggings. It's just that, when we do that, something feels off (to us humans, maybe due to risk-aversion), and we go "hmm, probably this framework is not modelling everything we want, or missing some important robustness considerations, or whatever, because I don't really feel like spending all my resources and creating a lot of disvalue just because in the world where 1 + 1 = 3 someone is offering me a good deal". You start to see how your abstractions might break, and how you can't get any satisfying notion of "complete updatelessness" (that doesn't go against important intuitions). And you start to rethink whether this is what we normatively want, nor what we realistically see in agents.

Comment by Martín Soto (martinsq) on Comparing Alignment to other AGI interventions: Basic model · 2024-03-20T23:25:21.859Z · LW · GW

You're right, I forgot to explicitly explain that somewhere! Thanks for the notice, it's now fixed :)

Comment by Martín Soto (martinsq) on Richard Ngo's Shortform · 2024-03-20T23:19:36.917Z · LW · GW

I like this picture! But

Voting on what actions get reward

I think real learning has some kind of ground-truth reward. So we should clearly separate between "this ground-truth reward that is chiseling the agent during training (and not after training)", and "the internal shards of the agent negotiating and changing your exact objective (which can happen both during and after training)". I'd call the latter "internal value allocation", or something like that. It doesn't neatly correspond to any ground truth, and is partly determined by internal noise in the agent. And indeed, eventually, when you "stop training" (or at least "get decoupled enough from reward"), it just evolves of its own, separate from any ground truth.

And maybe more importantly:

  • I think this will by default lead to wireheading (a trader becomes wealthy and then sets reward to be very easy for it to get and then keeps getting it), and you'll need a modification of this framework which explains why that's not the case.
  • My intuition is a process of the form "eventually, traders (or some kind of specialized meta-traders) change the learning process itself to make it more efficient". For example, they notice that topic A and topic B are unrelated enough, so you can have the traders thinking about these topics be pretty much separate, and you don't lose much, and you waste less compute. Probably these dynamics will already be "in the limit" applied by your traders, but it will be the dominant dynamic so it should be directly represented by the formalism.
  • Finally, this might come later, and not yet in the level of abstraction you're using, but I do feel like real implementations of these mechanisms will need to have pretty different, way-more-local structure to be efficient at all. It's conceivable to say "this is the ideal mechanism, and real agents are just hacky approximations to it, so we should study the ideal mechanism first". But my intuition says, on the contrary, some of the physical constraints (like locality, or the architecture of nets) will strongly shape which kind of macroscopic mechanism you get, and these will present pretty different convergent behavior. This is related, but not exactly equivalent to, partial agency.
Comment by Martín Soto (martinsq) on Policy Selection Solves Most Problems · 2024-03-20T23:05:26.458Z · LW · GW

It certainly seems intuitively better to do that (have many meta-levels of delegation, instead of only one), since one can imagine particular cases in which it helps. In fact we did some of that (see Appendix E).

But this doesn't really fundamentally solve the problem Abram quotes in any way. You add more meta-levels in-between the selector and the executor, thus you get more lines of protection against updating on infohazards, but you also get more silly decisions from the very-early selector. The trade-off between infohazard protection and not-being-silly remains. The quantitative question of "how fast should f grow" remains.

And of course, we can look at reality, or also check our human intuitions, and discover that, for some reason, this or that kind of f, or kind of delegation procedure, tends to work better in our distribution. But the general problem Abram quotes is fundamentally unsolvable. "The chaos of a too-early market state" literally equals "not having updated on enough information". "Knowledge we need to be updateless toward" literally equals "having updated on too much information". You cannot solve this problem in full generality, except if you already know exactly what information you want to update on... which means, either already having thought long and hard about it (thus you updated on everything), or you lucked into the right prior without thinking.

Thus, Abram is completely right to mention that we have to think about the human prior, and our particular distribution, as opposed to search for a general solution that we can prove mathematical things about.

Comment by Martín Soto (martinsq) on Richard Ngo's Shortform · 2024-03-20T22:53:15.513Z · LW · GW

People back then certainly didn't think of changing preferences.

Also, you can get rid of this problem by saying "you just want to maximize the variable U". And the things you actually care about (dogs, apples) are just "instrumentally" useful in giving you U. So for example, it is possible in the future you will learn dogs give you a lot of U, or alternatively that apples give you a lot of U.
Needless to say, this "instrumentalization" of moral deliberation is not how real agents work. And leads to getting Pascal's mugged by the world in which you care a lot about easy things.

It's more natural to model U as a logically uncertain variable, freely floating inside your logical inductor, shaped by its arbitrary aesthetic preferences. This doesn't completely miss the importance of reward in shaping your values, but it's certainly very different to how frugally computable agents do it.

I simply think the EV maximization framework breaks here. It is a useful abstraction when you already have a rigid enough notion of value, and are applying these EV calculations to a very concrete magisterium about which you can have well-defined estimates.
Otherwise you get mugged everywhere. And that's not how real agents behave.

Comment by Martín Soto (martinsq) on Comparing Alignment to other AGI interventions: Basic model · 2024-03-20T22:41:04.886Z · LW · GW

My impression was that this one model was mostly Hjalmar, with Tristan's supervision. But I'm unsure, and that's enough to include anyway, so I will change that, thanks :)

Comment by Martín Soto (martinsq) on Martín Soto's Shortform · 2024-03-19T18:22:05.378Z · LW · GW

Brain-dump on Updatelessness and real agents                            

Building a Son is just committing to a whole policy for the future. In the formalism where our agent uses probability distributions, and ex interim expected value maximization decides your action... the only way to ensure dynamic stability (for your Son to be identical to you) is to be completely Updateless. That is, to decide something using your current prior, and keep that forever.

Luckily, real agents don't seem to work like that. We are more of an ensemble of selected-for heuristics, and it seems true scope-sensitive complete Updatelessnes is very unlikely to come out of this process (although we do have local versions of non-true Updatelessness, like retributivism in humans).
In fact, it's not even exactly clear how I would use my current brain-state could decide something for the whole future. It's not even well-defined, like when you're playing a board-game and discover some move you were planning isn't allowed by the rules. There are ways to actually give an exhaustive definition, but I suspect the ones that most people would intuitively like (when scrutinized) are sneaking in parts of Updatefulness (which I think is the correct move).

More formally, it seems like what real-world agents do is much better-represented by what I call "Slow-learning Policy Selection". (Abram had a great post about this called "Policy Selection Solves Most Problems", which I can't find now.) This is a small agent (short computation time) recommending policies for a big agent to follow in the far future. But the difference with complete Updatelessness is that the small agent also learns (much more slowly than the big one). Thus, if the small agent thinks a policy (like paying up in Counterfactual Mugging) is the right thing to do, the big agent will implement this for a pretty long time. But eventually the small agent might change its mind, and start recommending a different policy. I basically think that all problems not solved by this are unsolvable in principle, due to the unavoidable trade-off between updating and not updating.[1]

This also has consequences for how we expect superintelligences to be. If by them having “vague opinions about the future” we mean a wide, but perfectly rigorous and compartmentalized probability distribution over literally everything that might happen, then yes, the way to maximize EV according to that distribution might be some very concrete, very risky move, like re-writing to an algorithm because you think simulators will reward this, even if you’re not sure how well that algorithm performs in this universe.
But that’s not how abstractions or uncertainty work mechanistically! Abstractions help us efficiently navigate the world thanks to their modular, nested, fuzzy structure. If they had to compartmentalize everything in a rigorous and well-defined way, they’d stop working. When you take into account how abstractions really work, the kind of partial updatefulness we see in the world is what we'd expect. I might write about this soon.

  1. ^

    Surprisingly, in some conversations others still wanted to "get both updatelessness and updatefulness at the same time". Or, receive the gains from Value of Information, and also those from Strategic Updatelessness. Which is what Abram and I had in mind when starting work. And is, when you understand what these words really mean, impossible by definition.

Comment by Martín Soto (martinsq) on 'Empiricism!' as Anti-Epistemology · 2024-03-16T22:17:54.922Z · LW · GW

Cool connections! Resonates with how I've been thinking about intelligence and learning lately.
Some more connections:

Indeed, those savvier traders might even push me to go look up that data (using, perhaps, some kind of internal action auction), in order to more effectively take the simple trader's money

That's reward/exploration hacking.
Although I do think most times we "look up some data" in real life it's not due to an internal heuristic / subagent being strategic enough to purposefully try and exploit others, but rather just because some earnest simple heuristics recommending to look up information have scored well in the past.

They haven't taken its money yet," said the Scientist, "But they will before it gets a chance to invest any of my money

I think this doesn't always happen. As good as the internal traders might be, the agent sometimes needs to explore, and that means giving up some of the agent's money.

Now, if I were an ideal Garrabrant inductor I would ignore these arguments, and only pay attention to these new traders' future trades. But I have not world enough or time for this; so I've decided to subsidize new traders based on how they would have done if they'd been trading earlier.

Here (starting at "Put in terms of Logical Inductors") I mention other "computational shortcuts" for inductors. Mainly, if two "categories of bets" seem pretty unrelated (they are two different specialized magisteria), then not having thick trade between them won't lose you out on much performance (and will avoid much computation).
You can have "meta-traders" betting on which categories of bets are unrelated (and testing them but only sparsely, etc.), and use them to make your inductor more computationally efficient. Of course object-level traders already do this (decide where to look, etc.), and in the limit this will converge like a Logical Inductor, but I have the intuition this will converge faster (at least, in structured enough domains).
This is of course very related to my ideas and formalism on meta-heuristics.

helps prevent clever arguers from fooling me (and potentially themselves) with overfitted post-hoc hypotheses

This adversarial selection is also a problem for heuristic arguments: Your heuristic estimator might be very good at assessing likelihoods given a list of heuristic arguments, but what if the latter has been selected against your estimator, top drive it in a wrong direction?
Last time I discussed this with them (very long ago), they were just happy to pick an apparently random process to generate the heuristic arguments, that they're confident enough hasn't been tampered with.
Something more ambitious would be to have the heuristic estimator also know about the process that generated the list of heuristic arguments, and use these same heuristic arguments to assess whether something fishy is going on. This will never work perfectly, but probably helps a lot in practice.
(And I think this is for similar reasons to why deception might be hard: When not the output, but also the "thoughts", of the generating process are scrutinized, it seems hard for it to scheme without being caught.)

Comment by Martín Soto (martinsq) on How disagreements about Evidential Correlations could be settled · 2024-03-11T19:30:39.047Z · LW · GW

I think it would be helpful to have a worked example here -- say, the twin PD

As in my A B C example, I was thinking of the simpler case in which two agents disagree about their joint correlation to a third. If the disagreement happens between two sides of a twin PD, then they care about slightly different questions (how likely A is to Cooperate if B Cooperates, and how likely B is to Cooperate if A Cooperates), instead of the same question. And this presents complications in exposition. Although, if they wanted to, they could still share their heuristics, etc.

To be clear, I didn't provide a complete specification of "what action a and action c are" (which game they are playing), just because it seemed to distract from the topic. That is, the relevant part is their having different beliefs on any correlation, not its contents.

Uh oh, this is starting to sound like Oesterheld's Decision Markets stuff. 

Yes! But only because I'm directly thinking of Logical Inductors, which are the same for epistemics. Better said, Caspar throws everything (epistemics and decision-making) into the traders, and here I am still using Inductors, which only throw epistemics into the traders.

My point is:
"In our heads, we do logical learning by a process similar to Inductors. To resolve disagreements about correlations, we can merge our Inductors in different ways. Some are lower-bandwidth and frugal, while others are higher-bandwidth and expensive."
Exactly analogous points could be made about our decision-making (instead of beliefs), thus the analogy would be to Decision Markets instead of Logical Inductors.

Comment by Martín Soto (martinsq) on nielsrolf's Shortform · 2024-03-10T02:58:13.236Z · LW · GW

Sounds a lot like this, or also my thoughts, or also shard theory!

Comment by Martín Soto (martinsq) on Evidential Correlations are Subjective, and it might be a problem · 2024-03-07T19:10:00.107Z · LW · GW

Thanks for the tip :)

Yes, I can certainly argue that. In a sense, the point is even deeper: we have some intuitive heuristics for what it means for players to have "similar algorithms", but what we ultimately care about is how likely it is that if I cooperate you cooperate, and when I don't cooperate, there's no ground truth about this. It is perfectly possible for one or both of the players to (due to their past logical experiences) believe they are not correlated, AND (this is the important point) if they thus don't cooperate, this belief will never be falsified. This "falling in a bad equilibrium by your own fault" is exactly the fundamental problem with FixDT (and more in general, fix-points and action-relevant beliefs).

More realistically, both players will continue getting observations about ground-truth math and playing games with other players, and so the question becomes whether these learnings will be enough to quick them out of any dumb equilibria.

Comment by Martín Soto (martinsq) on Evidential Correlations are Subjective, and it might be a problem · 2024-03-07T19:04:32.350Z · LW · GW

I think you're right, see my other short comments below about epsilon-exploration as a realistic solution. It's conceivable that something like "epsilon-exploration plus heuristics on top" groks enough regularities that performance at some finite time tends to be good. But who knows how likely that is.

Comment by Martín Soto (martinsq) on Why does generalization work? · 2024-03-06T20:23:52.749Z · LW · GW

I think that's the right next question!

The way I was thinking about it, the mathematical toy model would literally have the structure of microstates and macrostates. What we need is a set of (lawfully, deterministically) evolving microstates in which certain macrostate partitions (macroscopic regularities, like pressure) are statistically maintained throughout the evolution. And then, for my point, we'd need two different macrostate partitions (or sets of macrostate partitions) such that each one is statistically preserved. That is, complex macroscopic patterns it self-replicate (a human tends to stay in the macrostate partition of "the human being alive"). And they are mostly independent (humans can't easily learn about the completely different partition, otherwise they'd already be in the same partition).

In the direction of "not making it trivial", I think there's an irresolvable tension. If by "not making it trivial" you mean "s1 and s2 don't obviously look independent to us", then we can get this, but it's pretty arbitrary. I think the true name of "whether s1 and s2 are independent" is "statistical mutual information (of the macrostates)". And then, them being independent is exactly what we're searching for. That is, it wouldn't make sense to ask for "independent pattern-universes coexisting on the same substrate", while at the same time for "the pattern-universes (macrostate partitions) not to be truly independent".

I think this successfully captures the fact that my point/realization is, at its heart, trivial. And still, possibly deconfusing about the observer-dependence of world-modelling.

Comment by Martín Soto (martinsq) on Why does generalization work? · 2024-03-06T20:12:40.783Z · LW · GW

My post is consistent with what Eliezer says there. My post would simply remark:
You are already taking for granted a certain low-level / atomic set of variables = macro-states (like mortal, featherless, biped). Let me bring to your attention that you pay attention to these variables because they are written in a macro-state partition similar / useful to your own. It is conceivable for some external observer to look at low-level physics, and interpret it through different atomic macro-states (different from mortal, featherless, biped).

The same applies to unsupervised learning. It's not surprising that macro-states expressed in a certain language (the computation methods we've built to find simple regularities in certain sets of macroscopic variables). As before, there simply are just already some macro-state partitions we pay attention to, in which these macroscopic variables are expressed (but not others like "the exact position of a particle"), and also in which we build our tools (similarly to how our sensory perceptors are also built in them).

Comment by Martín Soto (martinsq) on Why does generalization work? · 2024-03-06T19:58:14.143Z · LW · GW

By random I just meant "no simple underlying regularity explains it shortly". For example, a low-degree polynomial has a very short description length. While a random jumble of points doesn't (you need to write the points one by one). This of course already assumes a language.

Comment by Martín Soto (martinsq) on Updatelessness doesn't solve most problems · 2024-03-06T19:43:29.024Z · LW · GW

Thank you Sylvester for the academic reference, and Wei for your thoughts!

I do understand from the SEP, like Wei, that sophisticated means "backwards planning", and resolute means "being able to commit to a policy" (correct me if I'm wrong).

My usage of "dynamic instability" (which might be contrary to academic usage) was indeed what Wei mentions: "not needing commitments". Or equivalently, I say a decision theory is dynamically stable if itself and its resolute version always act the same.

There are some ways to formalize exactly what I mean by "not needing commitments", for example see here, page 3, Desiderata 2 (Tiling result), although that definition is pretty in the weeds.

Comment by Martín Soto (martinsq) on Martín Soto's Shortform · 2024-02-28T22:38:10.179Z · LW · GW

Marginally against legibilizing my own reasoning:     

When taking important decisions, I spend too much time writing down the many arguments, and legibilizing the whole process for myself. This is due to completionist tendencies. Unfortunately, a more legible process doesn’t overwhelmingly imply a better decision!

Scrutinizing your main arguments is necessary, although this looks more like intuitively assessing their robustness in concept-space than making straightforward calculations, given how many implicit assumptions they all have. I can fill in many boxes, and count and weigh considerations in-depth, but that’s not a strong signal, nor what almost ever ends up swaying me towards a decision!

Rather than folding, re-folding and re-playing all of these ideas inside myself, it’s way more effective time-wise to engage my System 1 more: intuitively assess the strength of different considerations, try to brainstorm new ways in which the hidden assumptions fail, try to spot the ways in which the information I’ve received is partial… And of course, share all of this with other minds, who are much more likely to update me than my own mind. All of this looks more like rapidly racing through intuitions than filling Excel sheets, or having overly detailed scoring systems.

For example, do I really think I can BOTEC the expected counterfactual value (IN FREAKING UTILONS) of a new job position? Of course a bad BOTEC is better than none, but the extent to which that is not how our reasoning works, and the work is not really done by the BOTEC at all, is astounding. Maybe at that point you should stop calling it a BOTEC.

Comment by Martín Soto (martinsq) on CFAR Takeaways: Andrew Critch · 2024-02-24T22:16:08.904Z · LW · GW

This is pure gold, thanks for sharing!

Comment by Martín Soto (martinsq) on Why does generalization work? · 2024-02-21T06:16:14.716Z · LW · GW

Didn't know about ruliad, thanks!

I think a central point here is that "what counts as an observer (an agent)" is observer-dependent (more here) (even if under our particular laws of physics there are some pressures towards agents having a certain shape, etc., more here). And then it's immediate each ruliad has an agent (for the right observer) (or similarly, for a certain decryption of it).

I'm not yet convinced "the mapping function/decryption might be so complex it doesn't fit our universe" is relevant. If you want to philosophically defend "functionalism with functions up to complexity C" instead of "functionalism", you can, but C starts seeming arbitrary?

Also, a Ramsey-theory argument would be very cool.

Comment by Martín Soto (martinsq) on Updatelessness doesn't solve most problems · 2024-02-20T21:33:03.480Z · LW · GW

- Chan master Yunmon

Comment by Martín Soto (martinsq) on Why does generalization work? · 2024-02-20T21:29:14.710Z · LW · GW

Yep! Although I think the philosophical point goes deeper. The algorithm our brains themselves use to find a pattern is part of the picture. It is a kind of "fixed (de/)encryption".

Comment by Martín Soto (martinsq) on Updatelessness doesn't solve most problems · 2024-02-20T20:39:23.519Z · LW · GW

Thank you, habryka!

As mentioned in my answer to Eliezer, my arguments were made with that correct version of updatelessness in mind (not "being scared to learn information", but "ex ante deciding whether to let this action depend on this information"), so they hold, according to me.
But it might be true I should have stressed this point more in the main text.

Comment by Martín Soto (martinsq) on The lattice of partial updatelessness · 2024-02-20T20:36:13.489Z · LW · GW

Yep! I hadn't included pure randomization in the formalism, but it can be done and will yield some interesting insights.

As you mention, we can also include pseudo-randomization. And taking these bounded rationality considerations into account also makes our reasoning richer and more complex: it's unclear exactly when an agent wants to obfuscate its reasoning from others, etc.

Comment by Martín Soto (martinsq) on The lattice of partial updatelessness · 2024-02-20T20:34:03.455Z · LW · GW

First off, that  was supposed to be , sorry.

The agent might commit to "only updating on those things accepted by program ", even when it still doesn't have the complete infinite list of "exactly in which things does  update" (in fact, this is always the case, since we can't hold an infinite list in our head). It will, at the time of committing, know that  updates on certain things, doesn't update on others... and it is uncertain about exactly what it does in all other situations. But that's okay, that's what we do all the time: decide on an endorsed deliberation mechanism based on its structural properties, without yet being completely sure of what it does (otherwise, we wouldn't need the deliberation). But it does advise against committing while being too ignorant.

Comment by Martín Soto (martinsq) on Updatelessness doesn't solve most problems · 2024-02-20T20:29:30.067Z · LW · GW

Is it possible for all possible priors to converge on optimal behavior, even given unlimited observations?

Certainly not, in the most general case, as you correctly point out.

Here I was studying a particular case: updateless agents in a world remotely looking like the real world. And even more particular: thinking about the kinds of priors that superintelligences created in the real world might actually have.

Eliezer believes that, in these particular cases, it's very likely we will get optimal behavior (we won't get trapped priors, nor commitment races). I disagree, and that's what I argue in the post.

I'm also surprised that dynamic stability leads to suboptimal outcomes that are predictable in advance. Intuitively, it seems like this should never happen.

If by "predictable in advance" you mean "from the updateless agent's prior", then nope! Updatelessness maximizes EV from the prior, so it will do whatever looks best from this perspective. If that's what you want, then updatelessness is for you! The problem is, we have many pro tanto reasons to think this is not a good representation of rational decision-making in reality, nor the kind of cognition that survives for long in reality. Because of considerations about "the world being so complex that your prior will be missing a lot of stuff". And in particular, multi-agentic scenarios are something that makes this complexity sky-rocket.
Of course, you can say "but that consideration will also be included in your prior". And that does make the situation better. But eventually your prior needs to end. And I argue, that's much before you have all the necessary information to confidently commit to something forever (but other people might disagree with this).

Comment by Martín Soto (martinsq) on Updatelessness doesn't solve most problems · 2024-02-20T20:23:09.050Z · LW · GW

Is this consistent with the way you're describing decision-making procedures as updateful and updateless?

Absolutely. A good implementation of UDT can, from its prior, decide on an updateful strategy. It's just it won't be able to change its mind about which updateful strategy seems best. See this comment for more.

"flinching away from true information"

As mentioned also in that comment, correct implementations of UDT don't actually flinch away from information: they just decide ex ante (when still not having access to that information) whether or not they will let their future actions depend on it.

The problem remains though: you make the ex ante call about which information to "decision-relevantly update on", and this can be a wrong call, and this creates commitment races, etc.

Comment by Martín Soto (martinsq) on Natural abstractions are observer-dependent: a conversation with John Wentworth · 2024-02-20T20:12:16.679Z · LW · GW

I'm not sure we are in disagreement. No one is negating that the territory shapes the maps (which are part of the territory). The central point is just that our perception of the territory is shaped by our perceptors, etc., and need not be the same. It is still conceivable that, due to how the territory shapes this process (due to the most likely perceptors to be found in evolved creatures, etc.), there ends up being a strong convergence so that all maps represent isomorphically certain territory properties. But this is not a given, and needs further argumentation. After all, it is conceivable for a territory to exist that incentivizes the creation of two very different and non-isomorphic types of maps. But of course, you can argue our territory is not such, by looking at its details.

Where “joint carvy-ness” will end up being, I suspect, related to “gears that move the world,” i.e., the bits of the territory that can do surprisingly much, have surprisingly much reach, etc.

I think this falls for the same circularity I point at in the post: you are defining "naturalness of a partition" as "usefulness to efficiently affect / control certain other partitions", so you already need to care about the latter. You could try to say something like "this one partition is useful for many partitions", but I think that's physically false, by combinatorics (in all cases you can always build as many partitions that are affected by another one). More on these philosophical subtleties here: Why does generalization work?

Comment by Martín Soto (martinsq) on OpenAI's Sora is an agent · 2024-02-16T19:12:15.980Z · LW · GW

Guy who reinvents predictive processing through Minecraft

Comment by Martín Soto (martinsq) on The Commitment Races problem · 2024-02-15T18:58:50.509Z · LW · GW

I agree most superintelligences won't do something which is simply "play the ordinal game" (it was just an illustrative example), and that a superintelligence can implement your proposal, and that it is conceivable most superintelligences implement something close enough to your proposal that they reach Pareto-optimality. What I'm missing is why that is likely.

Indeed, the normative intuition you are expressing (that your policy shouldn't in any case incentivize the opponent to be more sophisticated, etc.) is already a notion of fairness (although in the first meta-level, rather than object-level). And why should we expect most superintelligences to share it, given the dependence on early beliefs and other pro tanto normative intuitions (different from ex ante optimization)? Why should we expect this to be selected for? (Either inside a mind, or by external survival mechanisms)
Compare, especially, to a nascent superintelligence who believes most others might be simulating it and best-responding (thus wants to be stubborn). Why should we think this is unlikely?
Probably if I became convinced trapped priors are not a problem I would put much more probability on superintelligences eventually coordinating.

Another way to put it is: "Sucks to be them!" Yes sure, but also sucks to be me who lost the $1! And maybe sucks to be me who didn't do something super hawkish and got a couple other players to best-respond! While it is true these normative intuitions pull on me less than the one you express, why should I expect this to be the case for most superintelligences?

Comment by Martín Soto (martinsq) on Updatelessness doesn't solve most problems · 2024-02-15T05:37:26.541Z · LW · GW

Thank you for engaging, Eliezer.

I completely agree with your point: an agent being updateless doesn't mean it won't learn new information. In fact, it might perfectly decide to "make my future action A depend on future information X", if the updateless prior so finds it optimal. While in other situations, when the updateless prior deems it net-negative (maybe due to other agents exploiting this future dependence), it won't.

This point is already observed in the post (see e.g. footnote 4), although without going deep into it, due to the post being meant for the layman (it is more deeply addressed, for example, in section 4.4 of my report). Also for illustrative purposes, in two places I have (maybe unfairly) caricaturized an updateless agent as being "scared" of learning more information. While really, what this means (as hopefully clear from earlier parts of the post) is "the updateless prior assessed whether it seemed net-positive to let future actions depend on future information, and decided no (for almost all actions)".

The problem I present is not "being scared of information", but the trade-off between "letting your future action depend on future information X" vs "not doing so" (and, in more detail, how exactly it should depend on such information). More dependence allows you to correctly best-respond in some situations, but also could sometimes get you exploited. The problem is there's no universal (belief-independent) rule to assess when to allow for dependence: different updateless priors will decide differently. And need to do so in advance of letting their deliberation depend on their interactions (they still don't know if that's net-positive).
Due to this prior-dependence, if different updateless agents have different beliefs, they might play very different policies, and miscoordinate. This is also analogous to different agents demanding different notions of fairness (more here). I have read no convincing arguments as to why most superintelligences will converge on beliefs (or notions of fairness) that successfully coordinate on Pareto optimality (especially in the face of the problem of trapped priors i.e. commitment races), and would be grateful if you could point me in their direction.

I interpret you as expressing a strong normative intuition in favor of ex ante optimization. I share this primitive intuition, and indeed it remains true that, if you have some prior and simply want to maximize its EV, updatelessness is exactly what you need. But I think we have discovered other pro tanto reasons against updatelessness, like updateless agents probably performing worse on average (in complex environments) due to trapped priors and increased miscoordination.

Comment by Martín Soto (martinsq) on The Commitment Races problem · 2024-02-15T05:27:58.880Z · LW · GW

The normative pull of your proposed procedure seems to come from a preconception that "the other player will probably best-respond to me" (and thus, my procedure is correctly shaping its incentives).

But instead we can consider the other player trying to get us to best-respond to them, by jumping up a meta-level: the player checks whether I am playing your proposed policy with a certain notion of fairness $X (which in your case is $5), and punishes accordingly to how far their notion of fairness $Y is from my $X, so that I (if I were to best-respond to his policy) would be incentivized to adopt notion of fairness $Y.

It seems clear that, for the exact same reason your argument might have some normative pull, this other argument has some normative pull in the opposite direction. It then becomes unclear which has stronger normative pull: trying to shape the incentives of the other (because you think they might play a policy one level of sophistication below yours), or trying to best-respond to the other (because you think they might play a policy one level of sophistication above yours).

I think this is exactly the deep problem, the fundamental trade-off, that agents face in both empirical and logical bargaining. I am not convinced all superintelligences will resolve this trade-off in similar enough ways to allow for Pareto-optimality (instead of falling for trapped priors i.e. commitment races), due to the resolution's dependence on the superintelligences' early prior.

Comment by Martín Soto (martinsq) on Updatelessness doesn't solve most problems · 2024-02-13T03:18:02.406Z · LW · GW

(Sorry, short on time now, but we can discuss in-person and maybe I'll come back here to write the take-away)

Comment by Martín Soto (martinsq) on Updatelessness doesn't solve most problems · 2024-02-11T04:25:57.226Z · LW · GW

To me it feels like the natural place to draw the line is update-on-computations but updateless-on-observations.

A first problem with this is that there is no sharp distinction between purely computational (analytic) information/observations and purely empirical (synthetic) information/observations. This is a deep philosophical point, well-known in the analytic philosophy literature, and best represented by Quine's Two dogmas of empiricism, and his idea of the "Web of Belief". (This is also related to Radical Probabilisim.)
But it's unclear if this philosophical problem translates to a pragmatic one. So let's just assume that the laws of physics are such that all superintelligences we care about converge on the same classification of computational vs empirical information.

A second and more worrying problem is that, even given such convergence, it's not clear all other agents will decide to forego the possible apparent benefits of logical exploitation. It's a kind of Nash equilibrium selection problem: If I was very sure all other agents forego them (and have robust cooperation mechanisms that deter exploitation), then I would just do like them. And indeed, it's conceivable that our laws of physics (and algorithmics) are such that this is the case, and all superintelligences converge on the Schelling point of "never exploiting the learning of logical information". But my probability of that is not very high, especially due to worries that different superintelligences might start with pretty different priors, and make commitments early on (before all posteriors have had time to converge). (That said, my probability is high that almost all deliberation is mostly safe, by more contingent reasons related to the heuristics they use and values they have.)
You might also want to say something like "they should just use the correct decision theory to converge on the nicest Nash equilibrium!". But that's question-begging, because the worry is exactly that others might have different notions of this normative "nice" (indeed, no objective criterion for decision theory). The problem recurs: we can't just invoke a decision theory to decide on the correct decision theory.

Am I missing something about why logical counterfactual muggings are likely to be common?

As mentioned in the post, Counterfactual Mugging as presented won't be common, but equivalent situations in multi-agentic bargaining might, due to (the naive application of) some priors leading to commitment races. (And here "naive" doesn't mean "shooting yourself in the foot", but rather "doing what looks best from the prior", even if unbeknownst to you it has dangerous consequences.)

if it comes up it seems that an agent that updates on computations can use some precommitment mechanism to take advantage of it

It's not looking like something as simple as that will solve, because of reasoning as in this paragraph:

Unfortunately, it’s not that easy, and the problem recurs at a higher level: your procedure to decide which information to use will depend on all the information, and so you will already lose strategicness. Or, if it doesn’t depend, then you are just being updateless, not using the information in any way.

Or in other words, you need to decide on the precommitment ex ante, when you still haven't thought much about anything, so your precommitment might be bad.
(Although to be fair there are ongoing discussions about this.)

Comment by Martín Soto (martinsq) on Updatelessness doesn't solve most problems · 2024-02-09T22:24:52.538Z · LW · GW

It seems like we should be able to design software systems that are immune to any infohazard

As mentioned in another comment, I think this is not possible to solve in full generality (meaning, for all priors), because that requires complete updatelessness and we don't want to do that.

I think all your proposed approaches are equivalent (and I think the most intuitive framing is "cognitive sandboxes"). And I think they don't work, because of reasoning close to this paragraph:

Unfortunately, it’s not that easy, and the problem recurs at a higher level: your procedure to decide which information to use will depend on all the information, and so you will already lose strategicness. Or, if it doesn’t depend, then you are just being updateless, not using the information in any way.

But again, the problem might be solvable in particular cases (like, our prior).

Comment by Martín Soto (martinsq) on Updatelessness doesn't solve most problems · 2024-02-09T22:19:16.547Z · LW · GW

The motivating principle is to treat one's choice of decision theory as itself strategic.

I share the intuition that this lens is important. Indeed, there might be some important quantitative differences between
a) I have a well-defined decision theory, and am choosing how to build my successor
and
b) I'm doing some vague normative reasoning to choose a decision theory (like we're doing right now),
but I think these differences are mostly contingent, and the same fundamental dynamics about strategicness are at play in both scenarios.

Design your decision theory so that no information is hazardous to it

I think this is equivalent to your decision theory being dynamically stable (that is, its performance never improves by having access to commitments), and I'm pretty sure the only way to attain this is complete updatelessness (which is bad).

That said, again, it might perfectly be that given our prior, many parts of cooperation-relevant concept-space seem very safe to explore, and so "for all practical purposes" some decision procedures are basically completely safe, and we're able to use them to coordinate with all agents (even if we haven't "solved in all prior-independent generality" the fundamental trade-off between updatelessness and updatefulness).

Comment by Martín Soto (martinsq) on Updatelessness doesn't solve most problems · 2024-02-09T02:05:25.477Z · LW · GW

I agree that the situation is less
"there is a theoretical problem which is solvable but our specification of Updatelessness is not solving"
and more
"there is a fundamental obstacle in game-theoretic interactions (at least the way we model them)".

Of course, even if this obstacle is "unavoidable in principle" (and no theoretical solution will get rid of it completely and for all situations), there are many pragmatic and realistic solutions (partly overfit to the situation we already know we are actually in) that can improve interactions. So much so as to conceivably even dissolve the problem into near-nothingness (although I'm not sure I'm that optimistic).

Comment by Martín Soto (martinsq) on Updatelessness doesn't solve most problems · 2024-02-09T02:01:28.314Z · LW · GW

You're right that (a priori and on the abstract) "bargaining power" fundamentally trades off against "best-responding". That's exactly the point of my post. This doesn't prohibit, though, that a lot of pragmatic and realistic improvements are possible (because we know agents in our reality tend to think like this or like that), even if the theoretical trade-off can never be erased completely or in all situations and for all priors.

Your latter discussion is a normative one. And while I share your normative intuitions that best-responding completely (being completely updateful) is not always the best to do in realistic situations, I do have quibbles with this kind of discourse (similar to this). For example, why would I want to go Straight even after I have learned the other does? Out of some terminal valuation of fairness, or counterfactuals, more than anything, I think (more here). Or similarly, why should I think sticking to my notion of fairness shall ex ante convince the other player to coordinate on it, as opposed to the other player trying to pull out some "even more meta" move, like punishing notions of fairness that are not close enough to theirs? Again, all of this will depend on our priors.

Comment by Martín Soto (martinsq) on Updatelessness doesn't solve most problems · 2024-02-09T00:36:49.817Z · LW · GW

This is exactly the kind of procedure which might get hindered by commitment races, because it involves "thinking more about what the other agents will do", and the point of commitment races is that sometimes (and depending on your beliefs) this can seem net-negative ex ante (that is, before actually doing the thinking).

Of course, this doesn't prohibit logical handshakes from being enacted sometimes. For example, if all agents start with a high enough prior on others enacting their part of , then they will do it. More realistically, it probably won't be as easy as this, but if it is the case that all agents feel safe enough thinking about  (they deem it unlikely this backfires into losing bargaining power), and/or the upshot is sufficiently high (when multiplied by the probability and so on), then all agents will deem it net-positive to think more about  and the others, and eventually they'll implement it.

So it comes down to how likely we think are priors (or the equivalent thing for AIs) which successfully fall into this coordination basin, opposed to priors which get stuck in some earlier prior without wanting to think more. And again, we have a few pro tanto reasons to expect coordination to be viable (and a few in the other direction). I do think out of my list of statements, logical handshakes in causal interactions might be one of the most likely ones.

Comment by Martín Soto (martinsq) on Updatelessness doesn't solve most problems · 2024-02-08T19:41:14.596Z · LW · GW

Another coarse, on-priors consideration that I could have added to the "Other lenses" section:

Eliezer says something like "surely superintelligences will be intelligent enough to coordinate on Pareto-optimality (and not fall into something like commitment races), and easily enact logical or value handshakes". But here's why I think this outside-view consideration need not hold. It is a generally good heuristic to think superintelligences will be able to solve tasks that seem impossible to us. But I think this stops being the case for tasks whose difficulty / complexity grows with the size / computational power / intelligence level of the superintelligence. For a task like "beating a human at Go" or "turning the solar system into computronium", the difficulty of the task is constant (relative to the size of the superintelligence you're using to solve it). For a task like "beat a copy of yourself in Go", that's clearly not the case (well, unless Go has a winning strategy that a program within our universe can enact, which would be a ceiling on difficulty). I claim "ensuring Pareto-optimality" is more like the latter. When the intelligence or compute of all players grows, it is true they can find more clever and sure-fire ways to coordinate robustly, but it's also true that they can individually find more clever ways of tricking the system and getting a bit more of the pie (and in some situations, they are individually incentivized to do this). Of course, one might still hold that the first will grow much more than the latter, and so after a certain level of intelligence, agents of a similar intelligence level will easily coordinate. But that's an additional assumption, relative to the "constant-difficulty" cases.

Of course, if Eliezer believes this it is not really because of outside-view considerations like the above, but because of inside-views about decision theory. But I generally disagree with his takes there (for example here), and have never found convincing arguments (from him or anyone) for the easy coordination of superintelligences.

Comment by Martín Soto (martinsq) on Updatelessness doesn't solve most problems · 2024-02-08T19:24:48.929Z · LW · GW

You're right there's something weird going on with fix-points and determinism: both agents are just an algorithm, and in some sense there is already a mathematical fact of the matter about what each outputs. The problem is none of them know this in advance (exactly because of the non-terminating computation problem), and so (while still reasoning about which action to output) they are logically uncertain about what they and the other outputs.

If an agent believes that the others' action is completely independent of their own, then surely, no commitment race will ensue. But say, for example, they believe their taking action A makes it more likely the other takes action B. This belief could be justified in a number of different ways: because they believe the other to be perfectly simulating them, because they believe the other to be imperfectly simulating them (and notice, both agents can imperfectly simulate each other, and consider this to give them better-than-chance knowledge about the other), because they believe they can influence the truth of some mathematical statements (EDT-like) that the other will think about, etc.

And furthermore, this doesn't solely apply to the end actions they choose: it can also apply to the mental moves they perform before coming to those actions. For example, maybe an agent has a high enough probability on "the other will just simulate me, and best-respond" (and thus, I should just be aggressive). But also, an agent could go one level higher, and think "if I simulate the other, they will probably notice (for example, by coarsely simulating me, or noticing some properties of my code), and be aggressive. So I won't do that (and then it's less likely they're aggressive)".

Another way to put all this is that one of them can go "first" in logical time (at the cost of having thought less about the details of their strategy).

Of course, we have some reasons to think the priors needed for the above to happen are especially wacky, and so unlikely. But again, one worry is that this could happen pretty early on, when the AGI still has such wacky and unjustified beliefs.

Comment by Martín Soto (martinsq) on A Shutdown Problem Proposal · 2024-01-22T11:19:12.741Z · LW · GW

Brain-storming fixes:

  • Each subagent's bargaining power is how much compute they can use. This makes everything more chaotic, and is clearly not what you had in mind with this kind of idealized agents solution.
  • Probabilistic vetos, such that those of some subagents are less likely to work. I think this breaks things in your proposal and still has the game-theoretic problems.
  • We ensure the priors of each subagent (about how the others respond) are such that going for risky game-theoretic stuff is not individually rational. Maybe some agents have more optimistic priors, and others less optimistic, and this results in the former controlling more, and the latter only try to use their veto in extreme cases (like to ensure the wrong successor is not built). But it'd be fiddly to think about the effect of these different priors on behavior, and how "extreme" the cases are in which veto is useful. And also this might mess up the agent's interactions with the world in other ways: for example, dogmatically believing that algorithms that look like subagents have "exactly this behavior", which is sometimes false. Although of course this kind of problem was already present in your proposal.
Comment by Martín Soto (martinsq) on A Shutdown Problem Proposal · 2024-01-22T11:12:47.646Z · LW · GW

Then there’s the problem of designing the negotiation infrastructure, and in particular allocating bargaining power to the various subagents. They all get a veto, but that still leaves a lot of degrees of freedom in exactly how much the agent pursues the goals of each subagent. For the shutdown use-case, we probably want to allocate most of the bargaining power to the non-shutdown subagent, so that we can see what the system does when mostly optimizing for u_1 (while maintaining the option of shutting down later).

I don't understand what you mean by "allocating bargaining power", given already each agent has true veto power. Regardless of the negotiation mechanism you set up for them (if it's high-bandwidth enough), or whether the master agent says "I'd like this or that agent to have more power", each subagent could go "give me my proportional (1/n) part of the slice, or else I will veto everything" (and depending on its prior about how other agents could respond, this will seem net-positive to do).

In fact that's just the tip of the iceberg of individually rational game-theoretic stuff (that messes with your proposal) they could pull off, see Commitment Races.

Comment by Martín Soto (martinsq) on Martín Soto's Shortform · 2024-01-20T10:41:06.085Z · LW · GW

The Singularity

Why is a rock easier to predict than a block of GPUs computing? Because the block of GPUs is optimized so that its end-state depends on a lot of computation.
[Maybe by some metric of “good prediction” it wouldn’t be much harder, because “only a few bits change”, but we can easily make it the case that those bits get augmented to affect whatever metric we want.]
Since prediction is basically “replicating / approximating in my head the computation made by physics”, it’s to be expected that if there’s more computation that needs to be finely predicted, the task is more difficult.
In reality, there is (in the low level of quantum physics) as much total computation going on, but most of it (those lower levels) are screened off enough from macro behavior (in some circumstances) that we can use very accurate heuristics to ignore them, and go “the rock will not move”. This is purposefully subverted in the GPU case: to cram a lot of useful computation into a small amount of space and resources, the micro computations (at the level of circuitry) are orderly secured and augmented, instead of getting screened off due to chaos.

Say we define the Singularity as “when the amount of computation / gram of matter (say, on Earth) exceeds a certain threshold”. What’s so special about this? Well, exactly for the same reason as above, an increase in this amount makes the whole setup harder to predict. Some time before the threshold, maybe we can confidently predict some macro properties of Earth for the next 2 months. Some time after it, maybe we can barely predict that for 1 minute.

But why would we care about this change in speed? After all, for now (against the backdrop of real clock time in physics) it doesn’t really matter whether a change in human history takes 1 year or 1 minute to happen.
[In the future maybe it does start mattering because we want to cram in more utopia before heat death, or because of some other weird quirk of physics.]
What really matters is how far we can predict “in terms of changes”, not “in terms of absolute time”. Both before and after the Singularity, I might be able to predict what happens to humanity for the next X FLOP (of total cognitive labor employed by all humanity, including non-humans). And that’s really what I care about, if I want to steer the future. The Singularity just makes it so these FLOP happen faster. So why be worried? If I wasn’t worried before about not knowing what happens after X+1 FLOP, and I was content with doing my best at steering given that limited knowledge, why should that change now?
[Of course, an option is that you were already worried about X FLOP not being enough, even if the Singularity doesn’t worsen it.]

The obvious reason is changes in differential speed. If I am still a biological human, then it will indeed be a problem that all these FLOP happen faster relative to clock time, since they are also happening faster relative to me, and I will have much less of my own FLOP to predict and control each batch of X FLOP made by humanity-as-a-whole.

In a scenario with uploads, my FLOP will also speed up. But the rest of humanity/machines won’t only speed up, they will also build way more thinking machines. So unless I speed up even more, or my own cognitive machinery also grows at that rate (via tools, or copies of me or enlarging my brain), the ratio of my FLOP to humanity’s FLOP will still decrease.

But there’s conceivable reasons for worry, even if this ratio is held constant:

  • Maybe prediction becomes differentially harder with scale. That is, maybe using A FLOPs (my cognitive machinery pre-Singularity) to predict X FLOPs (that of humanity pre-Singularity) is easier than using 10A FLOPs (my cognitive machinery post-Singularity) to predict 10X FLOPs (that of humanity post-Singularity). But why? Can’t I just split the 10X in 10 bins, and usea an A to predict each of them as satisfactorily as before? Maybe not, due to the newly complex interconnections between these bins. Of course, such complex interconnections also become positive for my cognitive machinery. But maybe the benefit for prediction from having those interconnections in my machinery is lower than the downgrade from having them in the predicted computation.

[A priori this seems false if we extrapolate from past data, but who knows if this new situation has some important difference.]

  • Maybe some other properties of the situation (like the higher computation-density in the physical substrate requiring the computations to take on a slightly different, more optimal shape [this seems unlikely]) lead to the predicted computation having some new properties that make it harder to predict. Such properties need not even be something absolute, that “literally makes prediction harder for everyone” (even for intelligences with the right tools/heuristics). It could just be “if I had the right heuristics I might be able to predict this just as well as before (or better), but all my heuristics have been selected for the pre-Singularity computation (which didn’t have this property), and now I don’t know how to proceed”. [I can run again a selection for heuristics (for example running again a copy of me growing up), but that takes a lot more FLOP.]
Comment by Martín Soto (martinsq) on Gentleness and the artificial Other · 2024-01-05T22:28:42.451Z · LW · GW

You might enjoy this analysis of the piece of sci-fi you didn't want to spoil.

There’s a movie that I’m trying not to spoil, in which an AI in a female-robot-body makes a human fall in love with her, and then leaves him to die, trapped and screaming behind thick glass. One of the best bits, I think, is the way, once it is done, she doesn’t even look at him."