orthonormal's Shortform

orthonormal

orthonormal's Shortform

post by orthonormal · 2019-10-31T05:24:47.692Z · LW · GW · 38 comments

38 comments

38 comments

Comments sorted by top scores.

comment by orthonormal · 2024-10-01T04:14:17.860Z · LW(p) · GW(p)

With the sudden simultaneous exits of Mira Murati, Barret Zoph, and Bob McGrew, I thought I'd update my tally of the departures [LW(p) · GW(p)] from OpenAI, collated with how quickly the ex-employee had signed the loyalty letter to Sam Altman last November.

The letter was leaked at 505 signatures, 667 signatures, and finally 702 signatures; in the end, it was reported that 737 of 770 employees signed. Since then, I've been able to verify 56 departures of people who were full-time employees (as far as I can tell, contractors were not allowed to sign, but all FTEs were).

I still think I'm missing some, so these are lower bounds (modulo any mistakes I've made).

Headline numbers:

Attrition for the 505 OpenAI employees who signed before the letter was first leaked: at least 24/505 = 4.8%
Attrition for the next 197 to sign (it was leaked again at 667 signatures, and one last time at 702): at least 13/197 = 6.6%
Attrition for the (reported) 68 who had not signed by the last leak: at least 19/68 = 27.9%.

Reportedly, 737 out of the 770 signed in the end, and many of the Superalignment team chose not to sign at all.

Below are my current tallies of some notable subsets. Please comment with any corrections!

People from the Superalignment team who never signed as of the 702 leak (including some policy/governance people who seem to have been closely connected) and are now gone:

Carroll Wainwright
Collin Burns
Cullen O'Keefe
Daniel Kokotajlo
Jan Leike (though he did separately Tweet that the board should resign)
Jeffrey Wu
Jonathan Uesato
Leopold Aschenbrenner
Mati Roy
William Saunders
Yuri Burda

People from the Superalignment team (and close collaborators) who did sign before the final leak but are now gone:

Jan Hendrik Kirchner (signed between 668 and 702)
Steven Bills (signed between 668 and 702)
John Schulman (signed between 506 and 667)
Sherry Lachman (signed between 506 and 667)
Ilya Sutskever (signed by 505)
Pavel Izmailov (signed by 505)
Ryan Lowe (signed by 505)
Todor Markov (signed by 505)

Others who didn't sign as of the 702 leak (some of whom may have just been AFK for the wrong weekend, though I doubt that was true of Karpathy) and are now gone:

Andrei Alexandru (Research Engineer)
Andrej Karpathy (Co-Founder)
Austin Wiseman (Finance/Accounting)
Girish Sastry (Policy)
Jay Joshi (Recruiting)
Katarina Slama (Member of Technical Staff)
Lucas Negritto (Member of Technical Staff, then Developer Community Ambassador)
Zarina Stanik (Marketing)

Notable other ex-employees:

Barrett Zoph (VP of Research, Post-Training; signed by 505)
Bob McGrew (Chief Research Officer; signed by 505)
Chris Clark (Head of Nonprofit and Strategic Initiatives; signed by 505)
Diane Yoon (VP of People; signed by 505)
Gretchen Krueger (Policy; signed by 505; posted a significant Twitter thread at the time she left)
Mira Murati (CTO; signed by 505)

Replies from: Careful_correction

↑ comment by Careful_correction · 2024-10-01T05:41:44.963Z · LW(p) · GW(p)

There are a few people in this list who I think are being counted incorrectly as FTEs (Mati and Andrei, for example).

I would also be careful about making inferences based on timing of supposed signature: I have heard that the signature Google Doc had crashed and so the process for adding names was slow and cumbersome. That is, the time at which someone’s name was added may have been significantly after they expressed desire to sign.

Replies from: orthonormal

↑ comment by orthonormal · 2024-10-07T18:38:45.509Z · LW(p) · GW(p)

Mati described himself as a TPM since September 2023 (after being PM support since April 2022), and Andrei described himself as a Research Engineer from April 2023 to March 2024. Why do you believe either was not a FTE at the time?

And while failure to sign isn't proof of lack of desire to sign, the two are heavily correlated—otherwise it would be incredibly unlikely for the small Superalignment team to have so many members who signed late or not at all.

comment by orthonormal · 2019-11-01T16:24:35.569Z · LW(p) · GW(p)

DeepMind released their AlphaStar paper a few days ago, having reached Grandmaster level at the partial-information real-time strategy game StarCraft II over the summer.

This is very impressive, and yet less impressive than it sounds. I used to watch a lot of StarCraft II (I stopped interacting with Blizzard recently because of how they rolled over for China), and over the summer there were many breakdowns of AlphaStar games once players figured out how to identify the accounts.

The impressive part is getting reinforcement learning to work at all in such a vast state space- that took breakthroughs beyond what was necessary to solve Go and beat Atari games. AlphaStar had to have a rich enough set of potential concepts (in the sense that e.g. a convolutional net ends up having concepts of different textures) that it could learn a concept like "construct building P" or "attack unit Q" or "stay out of the range of unit R" rather than just "select spot S and enter key T". This is new and worth celebrating.

The overhyped part is that AlphaStar doesn't really do the "strategy" part of real-time strategy. Each race has a few solid builds that it executes at GM level, and the unit control is fantastic, but the replays don't look creative or even especially reactive to opponent strategies.

That's because there's no representation of causal thinking - "if I did X then they could do Y, so I'd better do X' instead". Instead there are many agents evolving together, and if there's an agent evolving to try Y then the agents doing X will be replaced with agents that do X'.

(This lack of causal reasoning especially shows up in building placement, where the consequences of locating any one building here or there are minor, but the consequences of your overall SimCity are major for how your units and your opponents' units would fare if they attacked you. In one comical case, AlphaStar had surrounded the units it was building with its own factories so that they couldn't get out to reach the rest of the map. Rather than lifting the buildings to let the units out, which is possible for Terran, it destroyed one building and then immediately began rebuilding it before it could move the units out!)

This means that, first, AlphaStar just doesn't have a decent response to strategies that it didn't evolve, and secondly, it doesn't do much in the way of a reactive decision tree of strategies (if I scout this, I do that). That kind of play is unfortunately very necessary for playing Zerg at a high level, so the internal meta has just collapsed into one where its Zerg agents predictably rush out early attacks that are easy to defend if expected. This has the flow-through effect that its Terran and Protoss are weaker against human Zerg than against other races, because they've never practiced against a solid Zerg that plays for the late game.

The end result cleaned up against weak players, performed well against good players, but practically never took a game against the top few players. I think that DeepMind realized they'd need another breakthrough to do what they did to Go, and decided to throw in the towel while making it look like they were claiming victory.

Finally, RL practitioners have known that genuine causal reasoning could never be achieved via known RL architectures- you'd only ever get something that could execute the same policy as an agent that had reasoned that way, via a very expensive process of evolving away from dominated strategies at each step down the tree of move and countermove. It's the biggest known unknown on the way to AGI.

Replies from: Wei_Dai

↑ comment by Wei Dai (Wei_Dai) · 2019-11-01T19:27:58.201Z · LW(p) · GW(p)

This is the clearest and most insightful analysis of AlphaStar I've seen and IMO really should be a top-level post.

Replies from: orthonormal

↑ comment by orthonormal · 2019-11-02T01:43:21.019Z · LW(p) · GW(p)

Thanks, will do.

comment by orthonormal · 2024-06-19T22:46:41.258Z · LW(p) · GW(p)

By my assessment, the employees who failed to sign the final leaked version of the Altman loyalty letter have now been literally decimated.

I'm trying to track the relative attrition for a Manifold market: of the 265 OpenAI employees who hadn't yet signed the loyalty letter by the time it was first leaked, what percent will still be at OpenAI on the one-year anniversary?

I'm combining that first leaked copy with 505 signatures, the final leaked copy with 702 signatures, the oft-repeated total headcount of 770, and this spreadsheet tracking OpenAI departures (albeit with many false positives—people self-reporting as OpenAI employees because they customized their GPTs—so I'm working to verify names that appear on the spreadsheet but not on the letter; I'm sure the spreadsheet has false negatives as well, alas).

So far, I've verified at least ~~seven~~ [update: seven, with a probable eighth] departures of eligible figures who hadn't signed the letter with 702 names: Leopold Aschenbrenner, Jay Joshi (not fully verified by me), Andrej Karpathy, Daniel Kokotajlo, Jan Leike, Lucas Negritto, Katarina Slama, and William Saunders. If it's true that the total headcount at the time was 770, then that's 8 out of 68, or 11.8%.

Compare that to the attrition rate (as per the spreadsheet) for those who had signed the final leaked version but not the first: 10 departures out of 197, or 5.1%; and compare that to the attrition rate for those who signed promptly: 13 departures out of 505, or 2.6%.

Any causal inferences from this correlation are left as an exercise to the reader.

(A more important exercise, however: can anyone find a confirmation of the 770 number outside of unsourced media reports, or find a copy of the loyalty letter with more than 702 signatories, or ideally find a list of everyone at OpenAI at the time? I've tried a few different avenues without success.)

Replies from: Linch, orthonormal, orthonormal

↑ comment by Linch · 2024-06-20T05:31:02.581Z · LW(p) · GW(p)

"decimate" is one of those relatively rare words where the literal meaning is much less scary than the figurative meaning.

Replies from: ChristianKl, rotatingpaguro

↑ comment by ChristianKl · 2024-06-24T16:42:32.981Z · LW(p) · GW(p)

The literal meaning does actually include killing people and nobody at OpenAI got killed.

↑ comment by rotatingpaguro · 2024-06-20T12:51:58.208Z · LW(p) · GW(p)

Isn't it the opposite? To decimate = to kill 1 in 10 soldiers, figuratively to remove a certain fraction of elements from a set.

Replies from: quetzal_rainbow

↑ comment by quetzal_rainbow · 2024-06-20T13:57:48.808Z · LW(p) · GW(p)

Figuratively it is used as "to kill 9 in 10".

↑ comment by orthonormal · 2024-10-01T03:19:29.720Z · LW(p) · GW(p)

EDIT: On reflection, I made this a full Shortform post [LW(p) · GW(p)].

With the sudden simultaneous exits of Mira Murati, Barret Zoph, and Bob McGrew, I thought I'd do a more thorough scan of the departures. I still think I'm missing some, so these are lower bounds (modulo any mistakes I've made).

Headline numbers:

Attrition for the 505 OpenAI employees who signed before the letter was first leaked: at least 24/505 = 4.8%
Attrition for the next 197 to sign (it was leaked again at 667 signatures, and one last time at 702): at least 13/197 = 6.6%
Attrition for the (reported) 68 who had not signed by the last leak: at least 19/68 = 27.9%.

Reportedly, 737 out of the 770 signed in the end, and many of the Superalignment team chose not to sign at all.

Below are my current tallies of some notable subsets. Please comment with any corrections!

People from the Superalignment team who never signed as of the 702 leak (including some policy/governance people who seem to have been closely connected) and are now gone:

Carroll Wainwright
Collin Burns
Cullen O'Keefe
Daniel Kokotajlo
Jan Leike (though he did separately Tweet that the board should resign)
Jeffrey Wu
Jonathan Uesato
Leopold Aschenbrenner
Mati Roy
William Saunders
Yuri Burda

People from the Superalignment team (and close collaborators) who did sign before the final leak but are now gone:

Jan Hendrik Kirchner (signed between 668 and 702)
Steven Bills (signed between 668 and 702)
John Schulman (signed between 506 and 667)
Sherry Lachman (signed between 506 and 667)
Ilya Sutskever (signed by 505)
Pavel Izmailov (signed by 505)
Ryan Lowe (signed by 505)
Todor Markov (signed by 505)

Others who didn't sign as of the 702 leak (some of whom may have just been AFK for the wrong weekend, though I doubt that was true of Karpathy) and are now gone:

Andrei Alexandru (Research Engineer)
Andrej Karpathy (Co-Founder)
Austin Wiseman (Finance/Accounting)
Girish Sastry (Policy)
Jay Joshi (Recruiting)
Katarina Slama (Member of Technical Staff)
Lucas Negritto (Member of Technical Staff, then Developer Community Ambassador)
Zarina Stanik (Marketing)

Notable other ex-employees:

Barrett Zoph (VP of Research, Post-Training; signed by 505)
Bob McGrew (Chief Research Officer; signed by 505)
Chris Clark (Head of Nonprofit and Strategic Initiatives; signed by 505)
Diane Yoon (VP of People; signed by 505)
Gretchen Krueger (Policy; signed by 505; posted a significant Twitter thread at the time)
Mira Murati (CTO; signed by 505)

↑ comment by orthonormal · 2024-06-20T17:21:33.090Z · LW(p) · GW(p)

Note on current methodology:

I am, for now, not doing further research when the spreadsheet lists a person whose name appears on the final leaked letter; so it's possible that some of the 23 departures among the 702 names on the final leaked letter are spurious. (I will be more thorough when I resolve the market after November.)
I am counting only full-time employees and not counting contractors, as I currently believe that the 770 figure refers only to full-time employees. So far, I've seen no contractors among those who signed, but I've only checked a few; if the letter includes some categories of contractors, this gets a lot harder to resolve.
I am counting nontechnical employees (e.g. recruiting, marketing) as well as technical staff, because such employees were among those who signed the letter.

comment by orthonormal · 2024-01-24T23:57:11.632Z · LW(p) · GW(p)

"I endorse endorsing X" is a sign of a really promising topic for therapy (or your preferred modality of psychological growth).

If I can simply say "X", then I'm internally coherent enough on that point.

If I can only say "I endorse X", then not-X is psychologically load-bearing for me, but often in a way that is opaque to my conscious reasoning, so working on that conflict can be slippery.

But if I can only say "I endorse endorsing X", then not only is not-X load-bearing for me, but there's a clear feeling of resistance to X that I can consciously hone in on, connect with, and learn about.

Replies from: Dagon

↑ comment by Dagon · 2024-01-26T04:56:47.686Z · LW(p) · GW(p)

I'd understand this better (and perhaps even agree) if there were a few examples and a few counter-examples to find the boundaries of when this is effective.

For myself, without more words like "I endorse endorsing X under Y conditions because X is good for those who are hearing the endorsement and not necessarily for the endorser", I don't see how it works. The direct, unconditional form just makes me notice my dissonance and worry at it until I either endorse X or not-X (or neither - I'm allowed to be uncertain or ambivalent or just "context-dependent").

Replies from: orthonormal

↑ comment by orthonormal · 2024-01-26T17:48:17.923Z · LW(p) · GW(p)

Ah, I'm talking about introspection in a therapy context and not about exhorting others.

For example:

Internal coherence: "I forgive myself for doing that stupid thing".

Load-bearing but opaque: "It makes sense to forgive myself, and I want to, but for some reason I just can't".

Load-bearing and clear resistance: "I want other people to forgive themselves for things like that, but when I think about forgiving myself, I get a big NOPE NOPE NOPE".

P.S. Maybe forgiving oneself isn't actually the right thing to do at the moment! But it will also be easier to learn that in the third case than in the second.

comment by orthonormal · 2022-04-12T21:49:18.430Z · LW(p) · GW(p)

Has any serious AI Safety research org thought about situating themselves so that they could continue to function after a nuclear war?

Wait, hear me out.

A global thermonuclear war would set AI timelines back by at least a decade, for all of the obvious reasons. So an AI Safety org that survived would have additional precious years to work on the alignment problem, compared to orgs in the worlds where we avoid that war.

So it seems to me that at least one org with short timelines ought to move to New Zealand or at least move farther away from cities.

(Yes, I know MIRI was pondering leaving the Bay Area for underspecified reasons. I'd love to know what their thinking was regarding this effect, but I don't expect they'd reveal it.)

Replies from: mesaoptimizer, orthonormal

↑ comment by mesaoptimizer · 2024-06-20T22:35:45.477Z · LW(p) · GW(p)

I think we'll have bigger problems than just solving the alignment problem, if we have a global thermonuclear war that is impactful enough to not only break the compute supply and improvement trends, but also destabilize the economy and geopolitical situation enough that frontier labs aren't able to continue experimenting to find algorithmic improvements.

Agent foundations research seems robust to such supply chain issues, but I'd argue that gigantic parts of the (non-academic, non-DeepMind specific) conceptual alignment research ecosystem is extremely dependent on a stable and relatively resource-abundant civilization: LW, EA organizations, EA funding, individual researchers having the slack to do research, ability to communicate with each other and build on each other's research, etc. Taking a group of researchers and isolating them in some nuclear-war-resistant country is unlikely to lead to an increase in marginal research progress in that scenario.

↑ comment by orthonormal · 2024-06-20T17:26:39.314Z · LW(p) · GW(p)

The spun-off agent foundations team seems to have less reason than most AI safety orgs to be in the Bay Area, so moving to NZ might be worth considering for them.

comment by orthonormal · 2019-10-31T05:24:47.998Z · LW(p) · GW(p)

[Cross-posted from Medium, written for a pretty general audience]

There are many words that could describe my political positions. But there's one fundamental label for me: I am a consequentialist.

Consequentialism is a term from ethics; there, it means the position that consequences are what truly make an action right or wrong, rather than rules or virtues. What that means is that for me, the most essential questions about policy aren't things like "what is fair" or "what rights do people have", although these are good questions. For me, it all boils down to "how do we make people's lives better?"

(There are some bits of nuance to the previous paragraph, which I've kept as a long endnote.)

"Make people's lives better" isn't a platitude- there's a real difference here! To explain, I want to point out that there are both consequentialists and non-consequentialists within different political camps. Let's consider socialists first and then libertarians second.

Many socialists believe both that (A) the world is headed for plutocratic disaster unless capitalism is overthrown, and that (B) labor markets and massive wealth disparities would be crimes even if they did not doom others to suffering. The difference is that some are more motivated by beliefs like (A), and could thus change their positions if convinced that e.g. the Nordic model was much better for future growth than a marketless society; while others are more motivated by beliefs like (B), and would continue to support pure socialism even if they were convinced it would mean catastrophe.

And many libertarians believe both that (A') the only engine that can consistently create prosperity for all is a free market with no interference, and that (B') taxation is a monstrous act of aggression and theft. The difference is that some are more motivated by beliefs like (A'), and thus could change their position if convinced that e.g. progressive taxation and redistribution would not destroy the incentives behind economic growth; while others are more motivated by beliefs like (B'), and would continue to support pure libertarianism even if they were convinced it would mean catastrophe.

I find it fruitful to talk with the first kind of socialist and the first kind of libertarian, but not the second kind of either. The second type just isn’t fundamentally interested in thinking about the consequences (except insofar as they can convince others by arguing for certain consequences). But among the first type, it’s possible to figure out the truth together by arguing about historical cases, studying natural experiments in policy, and articulating different theories.

I hope it's been helpful to draw out this distinction; I'd encourage you to first find fellow consequentialists among your natural allies, and expand from there when and if you feel comfortable. There's a lot that can be done to make the world a better place, and those of us who care most about making the world better can achieve more once we find each other!

P.S. The above focuses on the sort of political questions where most people's influence is limited to voting and convincing others to vote with them. But there's more ways to have an effect than that; I'd like to take one last moment to recommend the effective altruism movement [? · GW], which investigates the best ways for people to have a big positive impact on the world.

---

Nuance section:

the position that consequences are what truly make an action right or wrong

There's a naive version of this, which is that you should seize any good immediate outcome you can, even by doing horrific things. That's... not a healthy version of consequentialism. The way to be less naive is to care about long-term consequences, and also to expect that you can't get away with keeping your behavior secret from others in general. Here's a good account of what non-naive consequentialism can look like.

the most essential questions about policy aren't things like "what is fair" or "what rights do people have", although these are good questions

In particular, fairness and rights are vital to making people's lives better! We want more than just physical comforts; we want autonomy and achievement and meaning, we want to have trustworthy promises about what the world will ask of us tomorrow, and we want past injustices to be rectified. But these can be traded off, in extreme situations, against the other things that are important for people. In a massive emergency, I'd rather save lives in an unfair way and try to patch up the unfairness later, than let people die to preserve fairness.

how do we make people's lives better?

This gets complicated and weird when you apply it to things like our distant descendants, but there are some aspects in the world today that seem fairly straightforward. Our world has built an engine of prosperity that makes food and goods available to many, beyond what was dreamt of in the past. But many people in the world are still living short and painful lives filled with disease and starvation. Another dollar of goods will do much more for one of them than for one of us. If we can improve their lives without destroying that engine, it is imperative to do that. (What consequentialists mostly disagree on is how the engine really works, how it could be destroyed, and how it could be improved!)

Replies from: cousin_it

↑ comment by cousin_it · 2019-10-31T09:26:35.099Z · LW(p) · GW(p)

It seems to me that your examples of B are mostly deontological, so it would be nice to have some C which represented virtue ethics as well.

Replies from: orthonormal

↑ comment by orthonormal · 2019-11-01T16:32:52.993Z · LW(p) · GW(p)

Virtue ethics seems less easily applicable to the domain of "what governmental policies to support" than to the domain of personal behavior, so I had a hard time thinking of examples. Can you?

Replies from: Pattern

↑ comment by Pattern · 2019-11-10T14:55:54.172Z · LW(p) · GW(p)

On politics, virtue ethics might say: "try to have leaders that are good"*, "accepting bribes is wrong", and perhaps "seek peace and shared ground rather than division and fear." (Working towards peace seems more virtuous than fear mongering.)

*and if they're not good, try and change that - gradual progress is better than no progress at all.

comment by orthonormal · 2025-03-25T02:06:36.076Z · LW(p) · GW(p)

How do you formalize the definition of a decision-theoretically fair problem, even when abstracting away the definition of an agent as well as embedded agency?

I've failed to find anything in our literature.

It's simple to define a fair environment, given those abstractions: a function E from an array of actions to an array of payoffs, with no reference to any other details of the non-embedded agents that took those actions and received those payoffs.

However, fair problems are more than just fair environments: we want a definition of a fair problem (and fair agents) under which, among other things:

The classic Newcomb's Problem against Omega, with certainty or with 1% random noise: fair
Omega puts $1M in the box iff it predicts that the player consciously endorses one-boxing, regardless of what it predicts the player will actually do (e.g. misunderstand the instructions and take a different action than they endorsed): unfair
Prisoner's Dilemma between two agents who base their actions on not only each others' predicted actions in the current environment, but also their predicted actions in other defined-as-fair dilemmas: fair
- For example, PrudentBot will cooperate with you if it deduces that you will cooperate with it and also that you would defect against DefectBot, because it wants to exploit CooperateBots).
Prisoner's Dilemma between two agents who base their actions on each others' predicted actions in defined-as-unfair dilemmas: unfair
- It would let us smuggle in unfairness from other dilemmas; e.g. if BlueEyedBot only tries Löbian cooperation against agents with blue eyes, and MetaBlueEyedBot only tries Löbian cooperation against agents that predictably cooperate with BlueEyedBot, then the Prisoner's Dilemma against MetaBlueEyedBot should count as unfair.

Modal combat doesn't need to worry about this, because all the agents in it are fair-by-construction.

Yeah, I know, it's about a decade late to be asking this question.

Replies from: Gurkenglas, Vladimir_Nesov, orthonormal

↑ comment by Gurkenglas · 2025-03-25T12:23:07.025Z · LW(p) · GW(p)

It sounds like you're trying to define unfair as evil.

↑ comment by Vladimir_Nesov · 2025-03-25T05:40:51.670Z · LW(p) · GW(p)

It's an essential aspect of decision making for an agent to figure out where it might be. Thought experiments try to declare the current situation, but they don't necessarily need to be able to convincingly succeed. Algorithmic induction, such as updating from Solomonoff prior, is the basic way an agent figures out which situations it should care about, and declaring that we are working with a particular thought experiment doesn't affect the prior. In line with updatelessness, an agent should be ready for observations in general (according to which of them it cares about more), rather than particular "fair" observations, so distinguishing observations that describe "fair" thought experiments doesn't seem right either.

↑ comment by orthonormal · 2025-03-25T05:04:13.587Z · LW(p) · GW(p)

My current candidate definitions, with some significant issues in the footnotes:

A fair environment is a probabilistic function from an array of actions to an array of payoffs.

An agent $A$ is a random variable

$A (F, A_{1}, . . ., A_{i - 1}, A_{i} = A, A_{i + 1}, . . ., A_{N})$

which takes in a fair environment $F$ ^[1] and a list of agents (including itself), and outputs a mixed strategy over its available actions in $F$ . ^[2]

A fair agent is one whose mixed strategy is a function of subjective probabilities^[3] that it assigns to [the actions of some finite collection of agents in fair environments, where any agents not appearing in the original problem must themselves be fair].

Formally, if $A$ is a fair agent in with a subjective probability estimator $P$ , $A$ 's mixed strategy in a fair environment $F$ ,

$A (F, A_{1}, . . ., A_{i - 1}, A_{i} = A, A_{i + 1}, . . ., A_{N})$

should depend only on a finite collection of $A$ 's subjective probabilities about outcomes

${P (F_{k} (A_{1}, . . ., A_{N}, B_{1}, . . . B_{M})) = [X_{1}, . . ., X_{N + M}]}_{k = 1}^{K}$

for a set of fair environments $F_{1}, . . ., F_{K}$ and an additional set of fair^[4] agents^[5] $B_{1}, . . ., B_{M}$ if needed (note that not all agents need to appear in all environments).

A fair problem is a fair environment with one designated player, where all other agents are fair agents.

^{^}
I might need to require every $F$ to have a default action $d_{F}$ , so that I don't need to worry about axiom-of-choice issues when defining an agent over the space of all fair environments.
^{^}
I specified a probabilistic environment and mixed strategies because I think there should be a unique fixed point for agents, such that this is well-defined for any fair environment $F$ . (By analogy to reflective oracles.) But I might be wrong, or I might need further restrictions on $F$ .
^{^}
Grossly underspecified. What kinds of properties are required for subjective probabilities here? You can obviously cheat by writing BlueEyedBot into your probability estimator.
^{^}
This is an infinite recursion, of course. It works if we require each $B_{m}$ to have a strictly lower complexity in some sense than $A$ (e.g. the rank of an agent is the largest number $K$ of environments it can reason about when making any decision, and each $B_{m}$ needs to be lower-rank than $A$ ), but I worry that's too strong of a restriction and would exclude some well-definable and interesting agents.
^{^}
Does the fairness requirement on the $B_{m}$ suffice to avert the MetaBlueEyedBot problem in general? I'm unsure.

comment by orthonormal · 2021-12-28T20:50:02.964Z · LW(p) · GW(p)

Is there already a concept handle for the notion of a Problem Where The Intuitive Solution Actually Makes It Worse But Makes You Want To Use Even More Dakka On It?

My most salient example is the way that political progressives in the Bay Area tried using restrictive zoning and rent control in order to prevent displacement... but this made for a housing shortage and made the existing housing stock skyrocket in value... which led to displacement happening by other (often cruel and/or backhanded) methods... which led to progressives concluding that their rules weren't restrictive enough.

Another example is that treating a chunk of the population with contempt makes a good number of people in that chunk become even more opposed to you, which makes you want to show even more contempt for them, etc. (Which is not to say their ideas are correct or even worthy of serious consideration - but the people are always worthy of respect.)

That sort of dynamic is how you can get an absolutely fucked-up self-reinforcing situation, an inadequate quasi-equilibrium that's not even a Nash equilibrium, but exists because at least one party is completely wrong about its incentives.

(And before you get cynical, of course there are disingenuous people whose preferences are perfectly well served in that quasi-equilibrium. But most activists do care about the outcomes, and would change their actions if they were genuinely convinced the outcomes would be different.)

Replies from: pjeby, michael-cohn

↑ comment by pjeby · 2021-12-28T22:46:02.735Z · LW(p) · GW(p)

"The Human Condition"? ;-)

More seriously, though, do you have any examples that aren't based on the instinct-to-punish(reality, facts, people,...) that I ranted about in Curse of the Counterfactual? If they all fall in this category, one could call it an Argument With Reality, which is Byron Katie's term for it. (You could also call it, "The Principle of the Thing", an older and more colloquial term for people privileging the idea of a thing over the substance of the thing, usually to an irrational extent.)

When people are having an Argument With Reality, they:

Go for approaches that impose costs on some target(s), in preference to ones that are of benefit to anyone
Refuse to acknowledge other points of view except for how it proves those holding them to be the Bad Wrong Enemies
Double down as long as reality refuses to conform or insufficient Punishment has occurred (defined as the Bad Wrong Enemies surrendering and submitting or at least showing sufficiently-costly signals to that effect)

A lot of public policy is driven this way; Wars on Abstract Nouns are always more popular than rehabiliation, prevention, and other benefit-oriented policies, which will be denigrated as being too Soft On Abstract Nouns. (This also applies of course to non-governmental public policies, with much the same incentives for anybody in the public view to avoid becoming considered one of the Bad Wrong Enemies.)

↑ comment by Michael Cohn (michael-cohn) · 2021-12-29T23:48:17.286Z · LW(p) · GW(p)

In terms of naming / identifying this, do you think it would help to distinguish what makes you want to double down on the current solution? I can think of at least 3 reasons:

Not being aware that it's making things worse
Knowing that it made things worse, but feeling like giving up on that tactic would make things get even worse instead of better
Being committed to the tactic more than to the outcome (what pjeby described as "The Principle of the Thing") -- which could itself have multiple reasons, including emotionally-driven responses, duty-based reasoning, or explicitly believing that doubling down somehow leads to better outcomes in the long run.

Do these all fall within the phenomenon you're trying to describe?

Replies from: orthonormal

↑ comment by orthonormal · 2021-12-31T05:13:48.640Z · LW(p) · GW(p)

Thanks for drawing distinctions - I mean #1 only.

comment by orthonormal · 2020-03-13T04:05:11.179Z · LW(p) · GW(p)

[EDIT: found it. Extensional vs intensional [LW · GW].]

Eliezer wrote something about two types of definitions, one where you explain your criterion, and one where you point and say "things like that and that, but not that or that". I thought it was called intensive vs extensive definition, but I can't find the post I thought existed. Does anyone else remember this?

Replies from: Zack_M_Davis

↑ comment by Zack_M_Davis · 2020-03-13T04:41:52.800Z · LW(p) · GW(p)

Some authors use ostensive to mean the same thing as "extensional."

comment by orthonormal · 2019-11-17T21:04:18.954Z · LW(p) · GW(p)

Is there a word for problems where, as they get worse, the exactly wrong response becomes more intuitively appealing?

For example, I'm thinking of the following chain (sorry for a political example, this is typically a political phenomenon):

resistance to new construction (using the ability of local boards to block projects)

causes skyrocketing rent

which together mean that the rare properties allowed to be developed get bid up to where they can only become high-end housing

which leads to anger at rich developers for building "luxury housing"

which leads to further resistance to new construction

and so on until you get San Francisco

Replies from: Viliam

↑ comment by Viliam · 2019-11-18T22:30:29.178Z · LW(p) · GW(p)

You probably already know that, but [LW · GW] a subset of what you described is called "positive feedback loop".

comment by orthonormal · 2019-11-09T18:03:23.756Z · LW(p) · GW(p)

Decision-theoretic blackmail is when X gets Y to choose A over B, not via acting to make the consequences of A more appealing to Y, but by making the consequences of B less appealing to Y.

The exceptions to this definition are pretty massive, though, and I don't know a principled emendation that excludes them.

1. There's a contract / social contract / decision-theoretic equilibrium, and within that, B will be punished. (This may not be a true counterexample, because the true choice is whether to join the contract... though this is less clear for the social contract than for the other two.)

2. Precommitting not to give in to blackmail is not itself blackmail. Of course, in an ultimatum game both players can imagine themselves as doing this.

Can anyone think of more exceptions, or a redefinition that clearly excludes these?

comment by orthonormal · 2024-02-02T04:25:28.930Z · LW(p) · GW(p)

In high-leverage situations, you should arguably either be playing tic-tac-toe (simple, legible, predictable responses) or playing 4-D chess to win. If you're making really nonstandard and surprising moves (especially in PR), you have no excuse for winding up with a worse outcome than you would have if you'd acted in bog-standard normal ways.

(This doesn't mean suspending your ethics! Those are part of winning! But if you can't figure out how to win 4-D chess ethically, then you need to play an ethical tic-tac-toe strategy instead.)

comment by orthonormal · 2025-04-01T18:48:48.931Z · LW(p) · GW(p)

[EDIT: Never mind, this is just Kleene's second recursion theorem!]

Quick question about Kleene's recursion theorem:

Let's say F is a computable function from ℕ^N to ℕ. Is there a single computable function X from ℕ^N to ℕ such that

X = F(X, y_2,..., y_N) for all y_2,...,y_N in ℕ

(taking the X within F as the binary code of X in a fixed encoding) or do there need to be additional conditions?

orthonormal's Shortform

Contents

38 comments