Comment by Dweomite on When Are Circular Definitions A Problem? · 2024-06-05T22:47:14.747Z · LW · GW

Using only circular definitions, is it possible to constraint words meanings so tightly that there's only one possible model which fits those constraints?

Isn't this sort-of what all formal mathematical systems do?  You start with some axioms that define how your atoms must relate to each other, and (in a good system) those axioms pin the concepts down well enough that you can start proving a bunch of theorems about them.

Comment by Dweomite on Ilya Sutskever and Jan Leike resign from OpenAI [updated] · 2024-05-21T22:52:06.325Z · LW · GW

I am not a lawyer, and my only knowledge of this agreement comes from the quote above, but...if the onboarding paperwork says you need to sign "a" general release, but doesn't describe the actual terms of that general release, then it's hard for me to see an interpretation that isn't either toothless or crazy:

  1. If you interpret it to mean that OpenAI can write up a "general release" with absolutely any terms they like, and you have to sign that or lose your PPUs, then that seems like it effectively means you only keep your PPUs at their sufferance, because they could simply make the terms unconscionable.  (In general, any clause that requires you to agree to "something" in the future without specifying the terms of that future agreement is a blank check.)
  2. If you interpret it to mean either that the employee can choose the exact terms, or that the terms must be the bare minimum that would meet the legal definition of "a general release", then that sounds like OpenAI has no actual power to force the non-disclosure or non-disparagement terms--although they could very plausibly trick employees into thinking they do, and threaten them with costly legal action if they resist.  (And once the employee has fallen for the trick and signed the NDA, the NDA itself might be enforceable?)
  3. Where else are the exact terms of the "general release" going to come from, if they weren't specified in advance and neither party has the right to choose them?
Comment by Dweomite on "Fractal Strategy" workshop report · 2024-04-22T03:16:11.471Z · LW · GW

In principle, any game where the player has a full specification of how the game works is immune to this specific failure mode, whether it's multiplayer or not.  (I say "in principle" because this depends on the player actually using the info; I predict most people playing Slay the Spire for the first time will not read the full list of cards before they start, even if they can.)

The one-shot nature makes me more concerned about this specific issue, rather than less.  In a many-shot context, you get opportunities to empirically learn info that you'd otherwise need to "read the designer's mind" to guess.

Mixing in "real-world" activities presumably helps.

If it were restricted only to games, then playing a variety of games seems to me like it would help a little but not that much (except to the extent that you add in games that don't have this problem in the first place).  Heuristics for reading the designer's mind often apply to multiple game genres (partly, but not solely, because approx. all genres now have "RPG" in their metaphorical DNA), and even if different heuristics are required it's not clear that would help much if each individual heuristic is still oriented around mind-reading.

Comment by Dweomite on "Fractal Strategy" workshop report · 2024-04-17T00:22:40.678Z · LW · GW

I have an intuition that you're partly getting at something fundamental, and also an intuition that you're partly going down a blind alley, and I've been trying to pick apart why I think that.

I think that "did your estimate help you strategically?" has a substantial dependence on the "reading the designer's mind" stuff I was talking about above.  For instance, I've made extremely useful strategic guesses in a lot of games using heuristics like:

  • Critical hits tend to be over-valued because they're flashy
  • Abilities with large numbers appearing as actual text tend to be over-valued, because big numbers have psychological weight separate from their actual utility
  • Support roles, and especially healing, tend to be under-valued, for several different reasons that all ultimately ground out in human psychology

All of these are great shortcuts to finding good strategies in a game, but they all exploit the fact that some human being attempted to balance the game, and that that human had a bunch of human biases.

I think if you had some sort of tournament about one-shotting Luck Be A Landlord, the winner would mostly be determined by mastery of these sorts of heuristics, which mostly doesn't transfer to other domains.

However, I can also see some applicability for various lower-level, highly-general skills like identifying instrumental and terminal values, gears-based modeling, quantitative reasoning, noticing things you don't know (then forming hypotheses and performing tests), and so forth.  Standard rationality stuff.


Different games emphasize different skills.  I know you were looking for specific things like resource management and value-of-information, presumably in an attempt to emphasize skills you were more interested in.

I think "reading the designer's mind" is a useful category for a group of skills that is valuable in many games but that you're probably less interested in, and so minimizing it should probably be one of the criteria you use to select which games to include in exercises.

I already gave the example of book games as revolving almost entirely around reading the designer's mind.  One example at the opposite extreme would be a game where the rules and content are fully-known in advance...though that might be problematic for your exercise for other reasons.

It might be helpful to look for abstract themes or non-traditional themes, which will have less associational baggage.

I feel like it ought to be possible to deliberately design a game to reward the player mostly for things other than reading the designer's mind, even in a one-shot context, but I'm unsure how to systematically do that (without going to the extreme of perfect information).

Comment by Dweomite on "Fractal Strategy" workshop report · 2024-04-13T16:23:50.637Z · LW · GW

Oh, hm.  I suppose I was thinking in terms of better-or-worse quantitative estimates--"how close was your estimate to the true value?"--and you're thinking more in terms of "did you remember to make any quantitative estimate at all?"

And so I was thinking the one-shot context was relevant mostly because the numerical values of the variables were unknown, but you're thinking it's more because you don't yet have a model that tells you which variables to pay attention to or how those variables matter?

Comment by Dweomite on "Fractal Strategy" workshop report · 2024-04-13T04:09:13.705Z · LW · GW

I'm kinda arguing that the skills relevant to the one-shot context are less transferable, not more.

It might also be that they happen to be the skills you need, or that everyone already has the skills you'd learn from many-shotting the game, and so focusing on those skills is more valuable even if they're less transferable.

But "do I think the game designer would have chosen to make this particular combo stronger or weaker than that combo?" does not seem to me like the kind of prompt that leads to a lot of skills that transfer outside games.

Comment by Dweomite on "Fractal Strategy" workshop report · 2024-04-13T01:08:14.316Z · LW · GW

OK.  So the thing that jumps out at me here is that most of the variables you're trying to estimate (how likely are cards to synergize, how large are those synergies, etc.) are going to be determined mostly by human psychology and cultural norms, to the point where your observations of the game itself may play only a minor role until you get close-to-complete information.  This is the sort of strategy I call "reading the designer's mind."

The frequency of synergies is going to be some compromise between what the designer thought would be fun and what the designer thought was "normal" based on similar games they've played.  The number of cards is going to be some compromise between how motivated the designer was to do the work of adding more cards and how many cards customers expect to get when buying a game of this type. Etc.


As an extreme example of what I mean, consider book games, where the player simply reads a paragraph of narrative text describing what's happening, chooses an option off a list, and then reads a paragraph describing the consequences of that choice.  Unlike other games, where there are formal systematic rules describing how to combine an action and its circumstances to determine the outcome, in these games your choice just does whatever the designer wrote in the corresponding box, which can be anything they want.

I occasionally see people praise this format for offering consequences that truly make sense within the game-world (instead of relying on a simplified abstract model that doesn't capture every nuance of the fictional world), but I consider that to be a shallow illusion.  You can try to guess the best choice by reasoning out the probable consequences based on what you know of the game's world, but the answers weren't actually generated by that world (or any high-fidelity simulation of it).  In practice you'll make better guesses by relying on story tropes and rules of drama, because odds are quite high that the designer also relied on them (consciously or not).  Attempting to construct a more-than-superficial model of the story's world is often counter-productive.

And no matter how good you are, you can always lose just because the designer was in a bad mood when they wrote that particular paragraph.


Strategy games like Luck Be A Landlord operate on simple and knowable rules, rather than the inscrutable whims of a human author (which is what makes them strategy games).  But the particular variables you listed aren't the outputs of those rules, they're the inputs that the designer fed into them.  You're trying to guess the one part of the game that can't be modeled without modeling the game's designer.

I'm not quite sure how much this matters for teaching purposes, but I suspect it matters rather a lot.  Humans are unusual systems in several ways, and people who are trying to predict human behavior often deploy models that they don't use to predict anything else.

What do you think?

Comment by Dweomite on "Fractal Strategy" workshop report · 2024-04-12T22:09:22.492Z · LW · GW

I feel confused about how Fermi estimates were meant to apply to Luck Be a Landlord.  I think you'd need error bars much smaller than 10x to make good moves at most points in the game.

Comment by Dweomite on Social status part 1/2: negotiations over object-level preferences · 2024-03-19T19:53:04.196Z · LW · GW

I came to a similar conclusion when thinking about the phenomenon of "technically true" deceptions.

Most people seem to have a strong instinct to say only technically-true things, even when they are deliberately deceiving someone (and even when this restriction significantly reduces their chances of success).  Yet studies find that the victims of a deception don't much care whether the deceiver was being technically truthful.  So why the strong instinct to do this costly thing, if the interlocutor doesn't care?

I currently suspect the main evolutionary reason is that a clear and direct lie makes it easier for the victim to trash your reputation with third parties.  "They said X; the truth was not-X; they're a liar."

If you only deceive by implication, then the deception depends on a lot of context that's difficult for the victim to convey to third parties.  The act of making the accusation becomes more costly, because more stuff needs to be communicated.  Third parties may question whether the deception was intentional.  It becomes harder to create common knowledge of guilt:  Even if one listener is convinced, they may doubt whether other listeners would be convinced.

Thus, though the victim is no less angry, the counter-attack is blunted.

Comment by Dweomite on One-shot strategy games? · 2024-03-19T04:38:53.443Z · LW · GW

Some concepts that I use:

Randomness is when the game tree branches according to some probability distribution specified by the rules of the game.  Examples:  rolling a die; cutting a deck at a random card.

Slay the Spire has randomness; Chess doesn't.

Hidden Information is when some variable that you can't directly observe influences the evolution of the game.  Examples: a card in an opponent's hand, which they can see but you can't; the 3 solution cards set aside at the start of a game of Clue; the winning pattern in a game of Mastermind.

People sometimes consider "hidden information" to include randomness, but I more often find it helpful to separate them.

However, it's not always obvious which model should be used.  For example, I usually find it most helpful to think of a shuffled deck as generating a random event each time you draw from the deck (as if you were taking a randomly-selected card from an unordered pool), but it's also possible to think of shuffling the deck as having created hidden information (the order that the deck is in), and it may be necessary to switch to this more-complicated model if there are rules that let players modify the deck (e.g. peeking at the top card, or inserting a card at a specific position).

Similar reasoning applies to a PRNG:  I usually think of it as a random event each time a number is generated, though it's also possible to think of it as a hidden seed value that you learn a little bit about each time you observe an output (and a designer may need to think in this second way to ensure their PRNG is not too exploitable).

Rule of thumb:  If you learn some information about the same variable more than once, then it's hidden info.  For instance, a card in your opponent's hand will influence their strategy, so you gain a little info about it whenever they move, which makes it hidden info.  If a variable goes from completely hidden to completely revealed in a single step (or if any remaining uncertainty has no impact on the game), then it's just randomness.

Interesting Side Note:  Monte Carlo Tree Search can handle randomness just fine, but really struggles with hidden information.

A Player is a process that selects between different game-actions based on strategic considerations, rather than a simple stochastic process.  An important difference between Chess and Slay the Spire is that Chess includes a second player.

We typically treat players as "outside the game" and unconstrained by any rules, though of course in any actual game the player has to be implemented by some actual process.  The line between "a player who happens to be an AI" and "a complicated game rule for selecting the next action" can be blurry.

A Mixed Equilibrium is when the rules of the game reward players for deliberately including randomness in their decision process.  For instance, in rock-paper-scissors, the game proceeds completely deterministically for a given set of player inputs, but there remains an important sense in which RPS is random but Chess is not, which is that one of these rewards players for acting randomly.


I have what I consider to be important and fundamental differences in my models between any two of these games:  Chess, Battleship, Slay the Spire, and Clue.

Yet, you can gain an advantage in any of these games by thinking carefully about your game model and its implications.

Comment by Dweomite on One-shot strategy games? · 2024-03-19T02:50:39.760Z · LW · GW

If your definition of "hidden information" implies that chess has it then I think you will predictably be misunderstood.

Terms that I associate with (gaining advantage by spending time modeling a situation) include:  thinking, planning, analyzing, simulating, computing ("running the numbers")

Comment by Dweomite on One-shot strategy games? · 2024-03-19T02:41:04.168Z · LW · GW

I haven't played it, but someone disrecommended it to me on the basis that there was no way to know which skills you'd need to survive the scripted events except to have seen the script before.

Comment by Dweomite on One-shot strategy games? · 2024-03-18T22:37:31.892Z · LW · GW

Unless I'm mistaken, StS does not have any game actions the player can take to learn information about future encounters or rewards in advance.  Future encounters are well-modeled as simple random events, rather than lurking variables (unless we're talking about reverse-engineering the PRNG, which I'm assuming is out-of-scope).

It therefore does not demonstrate the concept of value-of-information.  The player can make bets, but cannot "scout".

(Though there might be actions a first-time player can take to help pin down the rules of the game, that an experienced player would already know; I'm unclear on whether that counts for purposes of this exercise.)

Comment by Dweomite on The Worst Form Of Government (Except For Everything Else We've Tried) · 2024-03-17T21:48:27.460Z · LW · GW

While considering this idea, it occurred to me that you might not want whatever factions exist at the time you create a government to remain permanently empowered, given that factions sometimes rise or fall if you wait long enough.

Then I started wondering if one could create a system that somehow dynamically identifies the current "major factions" and gives de-facto vetoes to them.

And then I said: "Wait, how is that different from just requiring some voting threshold higher than 50% in order to change policy?"

Comment by Dweomite on One-shot strategy games? · 2024-03-14T01:12:02.937Z · LW · GW

It's good to clarify that you're looking for examples from multiple genres, though I'd caution you not to write off all "roguelikes" too quickly just because you've already found one you liked.  There are some games with the "roguelike" tag that have little overlap other than procedural content and permadeath.

For instance, Slay the Spire, Rogue Legacy, and Dungeons of Dredmor have little overlap in gameplay, though they are all commonly described as "roguelike".  (In fact, I notice that Steam now seems to have separate tags for "roguelike deckbuilder", "action roguelike", and "traditional roguelike"--though it also retains the generic "roguelike" tag.)

And that's without even getting into cases like Sunless Sea where permadeath and procedural generation were tacked onto a game where they're arguably getting in the way more than adding to the experience.

Comment by Dweomite on One-shot strategy games? · 2024-03-12T00:17:57.401Z · LW · GW

Wow.  I'm kind of shocked that the programmer understands PRNGs well enough to come up with this strategy for controlling different parts of the game separately and yet thinks that initializing a bunch of PRNGs to exactly the same seed is a good idea.

Nice find, though.  Thanks for the info!

(I note the page you linked is dated ~4 years ago; it seems possible this has changed since then.)

Comment by Dweomite on One-shot strategy games? · 2024-03-11T18:19:21.154Z · LW · GW

Another possible reason to disrecommend it is because it's hugely popular.

(The more popular a game, the more of your audience has already played it and therefore can't participate in "blind first run" exercise based on it.)

Comment by Dweomite on One-shot strategy games? · 2024-03-11T18:04:10.810Z · LW · GW

Is Slay the Spire programmed in such a way that giving players the same random seed will ensure a meaningfully-similar experience?

If it were programmed in the simple and obvious way, where it generates random numbers exactly when they're needed, I wouldn't particularly expect 2 players to see similar things.  For example, suppose at the end of her first battle, Alice's card rewards are determined by (say) her 21st, 22nd, and 23rd random outputs.  But Bob plays a slightly more conservative strategy in the first battle and takes 1 turn longer to beat it, meaning he draws 5 more random cards and the enemy takes 1 more action, so Bob's card rewards are determined by the 27th, 28th, and 29th random outputs and have no overlap with Alice's.

Statistically, I'd still expect more overlap than if they used different seeds--if you try this many times, they'll sometimes be "synchronized" by coincidence--but I'd still expect their results to be more different than the same, unless they also play very similarly.

I could imagine deliberately programming the game in a way where various important things (like battles and rewards) are generated in advance, before the player starts making choices, so that they're affected only by the seed and not by what the player has done before reaching them.  But that sounds like extra work that wouldn't matter except for exercises like this, so I'd be moderately surprised if they did that.  (I haven't checked.)

Comment by Dweomite on One-shot strategy games? · 2024-03-11T05:15:20.433Z · LW · GW

Point of comparison: Slay the Spire consistently takes me ~3 hours.  (I have a slow, thoughtful play style.)

Comment by Dweomite on One-shot strategy games? · 2024-03-11T05:01:44.874Z · LW · GW

Searching for games that fit, I am reminded that there's a frustrating number of games that have a difficult mode, but refuse to let you play it until you've beaten the game on an easier setting (sometimes more than once!)  It might be possible to work around that by copying someone's save file.

This list isn't super filtered, but here's some games that seem vaguely in line with your request:


Solar Settlers
Short strategy game with somewhat-risky exploration, pointed resource optimization, and exponential growth (if you play well).  Advertises a 10-minute playtime, but I think (this was years ago) that my games lasted more like 1-2 hours; I expect this is partly because I'm slow but partly because the better you play the more stuff you need to manage.

I think (66% confidence) the difference between difficulty levels is just how many points you need to count as a "win", and that you can finish the game even if you reach that threshold, so you could maybe tune the difficulty of the exercise by asking for a different score than what the game says.  (Though IIRC there's a regular play mode, and a skill-calibration mode, and I think only one of those lets you keep playing after you reach the target score, and I don't remember which.)


Defense of the Oasis
Explore the map and invest your followers into exploiting various terrain features to prepare for a barbarian invasion.  Short levels escalate in difficulty, play until you die.  Starts easy, but you could probably pick some number of stages (or some score threshold) that would be challenging.


Various Roguelike Deckbuilders
Those I've played tend to be a bit longer than you asked for but not hugely so.  The default difficulty is often hard for players unfamiliar with these types of games (but easy for veterans), and there's often harder unlockable difficulties.  There's typically "rules-based" hidden information in the form of not knowing the full set of cards and challenges that exist in the game, but rarely any "gamestate-based" hidden information.

The best-known is Slay the Spire.  I think the original and the hardest I've played is Dream Quest, but it's very luck-dependent.  Some others that have informed my impressions of this subgenre include Monster Train and Roguebook.  There's a zillion other ones nowadays.

You mentioned Luck be a Landlord, which is sort of on the edge of this category; compared to most I've played, it's faster, simpler and has no hand management.  Another game I've played that's on the edge of this category is Crop Rotation, which is more of a tableau-builder than a deckbuilder since all your cards are available at once (there's still luck, but mostly in what cards are offered in drafts).


Into the Breach
A tactical battle game where the enemies need to charge up their attacks and you can make them miss (or even hit each other) by moving things around, and you gradually upgrade over a series of short missions.  I think this one actually lets you play on hard mode from the start, but I don't remember for sure.  Not much in the way of hidden info.

You can vary the game length by choosing to play 2, 3, or 4 islands before doing the finale.

ETA:  Tyrant's Blessing has very similar gameplay and is much less likely to have already been played by your audience (though it's so similar that I'd expect a lot of skill transfer, and I'm uncertain about the difficulty and playtime).


Renowned Explorers: International Society
This is likely too long, but allegedly there are players who can finish a run in under 2 hours.  It's basically a combination of story events and turn-based tactical battles, with periodic breaks to spend your accumulated resources on upgrades.  There's a mechanic where you can do extra events for more resources at the cost of taking penalties that make the fights more difficult.

It's not that hard, but if you restrict yourself to only choosing expeditions from the highest-unlocked difficulty rating then most players probably won't win their first run.  Experience at tactical skirmish games may provide a significant advantage.


ETA:  Thought of a couple more, although these seem even less promising to me:


Farm Keeper
Turn-based economic strategy where you need to make escalating rent payments.  The starting difficulty is almost certainly too low for you, and once again, it only unlocks higher difficulties one at a time as you win.  Meeting various conditions while playing also permanently unlocks "secret tiles" that make all future runs easier by giving you more (and typically better) options.  Harder modes are also longer and may exceed the playtime you want.  So setting up an appropriate challenge might be painful.

It's also somewhat unfair as a one-shot exercise, because as you play you get access to new tiles that substantially alter the optimal layout, but you can't move your existing tiles and you can't see the late-game tiles in advance, so you can end up getting punished for having done a good job of optimizing the variables that you knew about.  (Though conceivably this teaches something about flexibility?)


Lueur and the Dim Settlers
This game actually isn't even out yet, but there's a free demo.  The demo is certainly too easy for your purposes.  But it's a relatively short turn-based survival strategy game where exploration plays an important role.  (Note: I have only a vague memory of how long this took to play.)

I think there were some simple action-based mini-games, too, so it's not pure strategy.


ETA Again:  If unknown information is not such a priority, one could also look at some solo-friendly strategy board games--several of which have digital versions, such as Spirit Island, Aeon's End, or One Deck Dungeon.

Comment by Dweomite on One-shot strategy games? · 2024-03-11T02:02:40.634Z · LW · GW

Several thoughts occur, in relation to this list of criteria.

  • Is 30-120 minutes supposed to be how long the game would take if you were playing normally, or if "you were taking a long time to think each turn or pausing a lot"?  I'm a pretty thoughtful and methodical player in strategy games, and some genres of game seem to take me as much as 2-5x as long to play as they take for a typical player, so this can make quite a difference.  (Though the upper end of that range may only apply if I'm actually pausing to take notes.)
  • Are you looking for games where a typical player on their first play will lose, or where they will take longer to win than the given timeframe, or is either fine?
  • With respect to "value of information", I usually make a distinction between learning gamestate and learning rules--for instance, in Poker, the first would be learning what cards someone has, while the second would be learning whether or not a straight beats a flush.
    • Hidden gamestate generally has precise bounds for what it could be (if you already know the rules), but are generally made deliberately unguessable within those bounds.  The rules of the game have no strict boundaries on what they could be, but usually aren't designed to be mysterious, and your ability to guess them may be extremely dependent on how much experience you have with similar games.
    • It would be extremely unusual for a game to take into account how much a move reveals about the rules of the game when balancing them.
    • Most modern video games are designed to teach you the rules during your first game (which may actually be an obstacle if you want an exercise where the rules are taken as known)
  • My impression is that most games with exploration as a mechanic are optimized to deliver feelings of discovery, rather than interesting strategy around explore/exploit trade-offs.
  • Any game where information is valuable, and where gathering less than the maximum amount of helpful information is a viable strategy, is a game with a significant amount of luck.  (Since this implies you can gamble on unknown information at reasonable odds.)  This means players who only play once will not have a reliable measurement of how well they actually played.
Comment by Dweomite on Exercise: Planmaking, Surprise Anticipation, and "Baba is You" · 2024-02-26T23:19:29.036Z · LW · GW

On a literal level, I can't play "a level I haven't played before", which is what the instructions call for.

On a practical level, I've already spent multiple hours beating my head against this wall, and when I stopped I had no remotely promising ideas for how to make further progress.  (And most of that time was spent staring at the puzzle and thinking hard without interacting with it, so it was already kind of similar to this exercise.)

Admittedly, this was years ago, so maybe it's time to revisit the puzzle anyway.

I will note that a level editor for this game seems to exist, so in theory you could craft custom levels for this exercise.  Though insofar as the point is being potentially-surprised by the rules, maybe that doesn't help if you aren't inventing new rules as well.

Comment by Dweomite on Exercise: Planmaking, Surprise Anticipation, and "Baba is You" · 2024-02-26T23:07:28.372Z · LW · GW

Baba Is You is an unusual puzzle game in a way that seems relevant here.

One way of classifying puzzle games might be on a continuum from logic-based to exploration-based (or, if you like, between logical uncertainty and environmental uncertainty).

At the first extreme you have stuff like Sudoku, or logic grids, or three gods named True, False, and Random, or blue eyes.  In these puzzles, you are given all necessary information up-front, and you should (if the puzzle is well-constructed) be able to verify the solution entirely on your own, without requiring an external authority to confirm it.

At the opposite extreme, there's 20 questions or mastermind or Guess Who?, where the entire point is that necessary information is being withheld and you need to interact with the puzzle to expose it.  Knowing all the information is the solution; there would be no point without the concealment.

Baba Is You is pretty close to the first extreme, but not all the way there.  It does ask you learn the basic rules of the game by interacting with it, and it does gradually introduce new rules, but most of the difficulty comes from logical uncertainty.  Some puzzles do not introduce new rules at all, or only introduce new rules in the sense of exploring the edge cases of a previously-established rule.  It also makes the entire puzzle visible at once, so once you understand the rules it becomes a pure logic puzzle.

This exercise relies on the possibility of being empirically surprised, but also on being able to make fairly detailed plans in spite of that possibility.  This seems like it requires (or at least heavily benefits from) being at a pretty narrow area within the logic <=> exploration continuum, which Baba Is You happens to be exactly situated at.

Most puzzle video games lean more heavily on exploration than that.  You mentioned The Witness, which I would classify as primarily exploration-based:  each series of puzzles centers around a secret rule that you need to infer through experimentation, and most puzzles are easy once you have figured out the secret rule.  (The game Understand, mentioned by another commenter, has the same premise.)

Another puzzle game I recognize from the bundle you linked is Superliminal, which has the premise that you're inside a dream and solve puzzles using dream-logic.  I'd also consider that heavily exploration-based.

The Talos Principle is much closer to Baba Is You's point on this continuum, with a relatively small number of rules and an emphasis on applying them creatively, although in The Talos Principle you can't always see the entire puzzle before you begin solving it, and I'd say the puzzle components' appearances are less suggestive of their functions than the adjectives in Baba Is You, probably making it significantly harder to guess how they'll behave without doing some experimentation.

Patrick's Parabox is similar to Baba Is You in that they are both Sokoban games, though I didn't play too far in Patrick's Parabox because the puzzles felt more workaday and less mind-bendy and I just got bored.  (Though it's highly rated, so presumably most people didn't.)

Comment by Dweomite on Exercise: Planmaking, Surprise Anticipation, and "Baba is You" · 2024-02-26T22:22:15.811Z · LW · GW

Unless you've literally beaten the entire game, this exercise works if you play a level you haven't played before.

I haven't beaten every level in the game, but I don't have access to any levels that I haven't played before, because the reason I stopped playing was that I had already tried and failed every remaining available level.

(Though I suppose I could cheat and look up the solution for where I got stuck...)

Summarize the key concept of the solution

This might not apply to the early levels you've focused on in your examples, but an observation I made while playing the more advanced levels of this game was that often there was not just one key concept.

In most puzzle games that I've played, I find I can quickly get a sort of feel for the general shape of the solution:  I start here, I have to end up there, therefore there must be a step in the middle that bridges those two.  This often narrows the possible search space quite a lot, because the missing link has to touch the parts I know about at both the beginning and the end.

Lots of puzzles in Baba Is You have two significant steps in the middle.  And this is a huge jump in difficulty, because it means there's an intermediate state between those two steps and I have no idea what that intermediate state looks like so I can't use it to infer the shape of those steps.  Each of the missing steps has only 1 constraint instead of 2.

Comment by Dweomite on Brute Force Manufactured Consensus is Hiding the Crime of the Century · 2024-02-13T05:12:04.857Z · LW · GW

You heavily implied that Roko had assigned that probability to that event, and that implication is false.

Comment by Dweomite on Brute Force Manufactured Consensus is Hiding the Crime of the Century · 2024-02-13T02:49:10.694Z · LW · GW

I'm trying to get better at noticing when the topic of a conversation has drifted, so that I don't unwittingly feel pressured to defend a position that is stronger, or broader, or just different, from what I was trying to say.

I was originally trying to say:  When you said Roko's number implied he thought people in Wuhan were less likely than the global average to be patient zero in a pandemic, I think that was an important misrepresentation of Roko's actual argument.

I notice that we no longer seem to be discussing that, or anything that could plausibly change anyone's opinion on that.  So I'm going to stop here.

(I'm not necessarily claiming this is the first point in this conversation where I could have noticed this.  Like I said, trying to get better.)

Comment by Dweomite on Brute Force Manufactured Consensus is Hiding the Crime of the Century · 2024-02-12T06:35:11.790Z · LW · GW

I did see this, but didn't find it convincing. China has become substantially more urban, more interconnected, more populous, and more connected to the outside world even over the past 10 or 20 years. A claim like this requires substantially more thorough analysis.

Your first comment seemed to take the position that the OP's number was not merely different from yours, but indefensible, and you gave a lower bound for a defensible prior that was 1.4x higher than the number you were complaining about.

I feel like you have softened your position to the point where it no longer supports your original comment (from "timing is not even a consideration" to "this timing argument is less thorough than I think it ought to be").  If this is because you changed your mind, great!  If not, then I'm confused about how these comments are meant to be squared.

Are you claiming the timing argument is so weak that no reasonable person could possibly estimate its Bayes factor as >1.4?  I don't feel like you've come close to justifying a claim like that.

And, again, is it reasonable to start researching and make COVID in the ~2 year time window given?

I have no idea!  What's your 90% CI for how long it would take them, and what evidence are you relying on for that?

I think that whatever the next pandemic out of Southern or Central China or Southeast Asia is, the WIV (or some other lab in the region) is extremely likely to have a sample of a related virus and studied it.

I previously thought you were claiming "the unconditional probability of a naturally-occurring pandemic to be a bat coronavirus is ~1".  This claim differs from that in several ways.  Thank you for clarifying!

Making the probability conditional on location of origin:  Absolutely fair, we already accounted for the improbability of the location.  I missed this.

On the category of the disease we are matching:  "bat coronavirus" may be too narrow (though I got that phrase from you), but "have a sample and have studied it" seems too broad.  What's your probability if we change that to "are currently performing gain-of-function research on it"?

(I also notice your claim is phrased such that it presumes any pandemic will be caused by a virus, but I'm assuming that was accidental and your claim generalizes to all vectors.)

I'm somewhat surprised that you're so skeptical of this; I don't think anyone was ever in doubt that bat coronaviruses spilling into humans in Southeast Asia this part of China has been considered a likely problem for a long time. 

"This is likely to happen" and "there's approximately a 100% chance that the very next problem in this general category will be this" are not the same, and are not close to being the same.

Comment by Dweomite on Brute Force Manufactured Consensus is Hiding the Crime of the Century · 2024-02-12T01:39:31.752Z · LW · GW

The timing is given by such a weak argument that I did ignore it, yes. WIV has been studying bat coronaviruses for years, and probably will continue to do so for years, and the only thing to tie it so closely in time is a rejected grant proposal that emphasized having the actual work done at UNC. 

This was in a section of the OP marked as an edit, so it's possible this level of detail wasn't there the first time you looked:

We must also account for the timing here. Each year in the modern period from, say, 1970 until today has a decently large chance of human-animal transmission, perhaps with some bias towards the present due to more travel. But gain of function is a new invention - it only really started in 2011 and funding was banned in 2014, then the moratorium was lifted in 2017. The 2011-2014 period had little or no coronavirus gain of function work as far as I am aware. So coronavirus gain of function from a lab could only have occurred after say 2010 and was most likely after 2017 when it had the combination of technology and funding. This is a period of about 2 years out of the entire 1920-2020 hundred-year window. Now, we could probably discount that hundred year window down to say an equivalent of 40 years as people have become more mobile and more numerous in China over the past 100 years, on average. But that is still something like a 1 in 20 chance that the worst coronavirus pandemic of the past hundred years happened in the exact 2-year window when gain of function research was happening most aggressively, and that is independent from the location coincidence.

Note this reasoning does not rely on the grant proposal.

"What kind of disease" has Bayes Factor 1. 

You appear to have more knowledge of virology than I do, but this is far too implausible (on my model) for me to believe it merely because you declared it.  I've heard of many plagues that were not bat coronaviruses.  Your prior on the next naturally-occurring pandemic being a bat coronavirus cannot plausibly be ~100% unless you know some hitherto-unmentioned information that would be very startling to me.

Comment by Dweomite on Brute Force Manufactured Consensus is Hiding the Crime of the Century · 2024-02-11T09:20:30.136Z · LW · GW

Wuhan is a city of 11 million people; the world population is about 7.9 billion. Saying that the prior on a zoonotic origin is anything less than 11 million / 7.9 billion = 1.4/1000 means that you think people living in Wuhan are less likely to be patient 0 than the average person in the entire world.

The odds given in the OP are based on 3 coincidences:

  • Location
  • Timing
  • What kind of disease it was

Your number is only based on 1 of those coincidences (location).  It is not surprising that the probability of one of those things is higher than the probability of all 3 at once.

Comment by Dweomite on Bayesians Commit the Gambler's Fallacy · 2024-01-09T07:53:40.945Z · LW · GW

Are "switchy" and "streaky" accepted terms-of-art?  I wasn't previously familiar with them and my attempts to Google them mostly lead back to this exact paper, which makes me think this paper probably coined them.

Comment by Dweomite on Bayesians Commit the Gambler's Fallacy · 2024-01-09T02:01:38.985Z · LW · GW

This seems like a difficult situation because they need to refer to the particular way-of-betting that they are talking about, and the common name for that way-of-betting is "the gambler's fallacy", and so they can't avoid the implication that this way-of-betting is based on fallacious reasoning except by identifying the way-of-betting in some less-recognizable way, which trades off against other principles of good communication.

I suppose they could insert the phrase "so-called".  i.e. "Bayesians commit the so-called Gambler's Fallacy".  (That still funges against the virtue of brevity, though not exorbitantly.)

What would you have titled this result?

Comment by Dweomite on Meaning & Agency · 2024-01-08T23:16:15.060Z · LW · GW

These definitions seem like they only allow Alice to recognize agents that are "as strong as" Alice in the sense that Alice doesn't think she could improve on their decisions.  For instance, Alice won't endorse Bob's chess moves if Alice can play chess better than Bob (even if Carol would endorse Bob's moves).  Have I understood correctly?

Comment by Dweomite on Principles For Product Liability (With Application To AI) · 2023-12-12T10:05:54.481Z · LW · GW

Thanks for doing research!

Your link goes on to say:

To ensure that you are treating the LLC as a separate legal entity, the owners must:

  • Avoid co-mingling assets . The LLC must have its own federal employer identification number and business-only checking account. An owner’s personal finances should never be included in the LLC’s accounting books. All business debts should be paid out of the LLC’s dedicated bank account.
  • Act fairly. The LLC should make honest representations regarding the LLC’s finances to vendors, creditors or other interested parties.
  • Operating Agreement. Have all the members executed a formal written operating agreement that sets forth the terms and conditions of the LLC’s existence.

I imagine the bite is in the "act fairly" part?  That sounds distressingly like the judge just squints at your LLC and decides whether they think you're being reasonable.

Comment by Dweomite on Principles For Product Liability (With Application To AI) · 2023-12-11T19:42:37.886Z · LW · GW

Naive question:  What stops a company from conducting all transactions through LLCs and using them as liability shields?

I'm imagining something like:  Instead of me selling a car to Joe, I create a LLC, loan the LLC the money to buy a car, sell the car to the LLC for the loaned money, the LLC sells the car to Joe for the same price, uses Joe's money to repay the loan, leaving the LLC with zero assets, and no direct business relationship between me and Joe.

I imagine we must already have something that stops this from working, but I don't know what it is.

Comment by Dweomite on Principles For Product Liability (With Application To AI) · 2023-12-11T19:34:27.917Z · LW · GW

If you allow the provider of a product or service to contract away their liability, I predict in most cases they will create a standard form contract that they require all customers to sign that transfers 100% of the liability to the customer in ~all circumstances, which presumably defeats the purpose of assigning it to the provider in the first place.

Yes, customers could refuse to sign the contract.  But if they were prepared to do that, why haven't they already demanded a contract in which the provider accepts liability (or provides insurance), and refused to do business without one?  Based on my observations, in most cases, ~all customers sign the EULA, and the company won't even negotiate with anyone who objects because it's not worth the transaction costs.

Now, even if you allow negotiating liability away, it would still be meaningful to assign the provider liability for harm to third parties, since the provider can't force third parties to sign a form contract (they will still transfer that liability to the customer, but this leaves the provider as second-in-line to pay, if the customer isn't caught or can't pay).  So this would matter if you're selling the train that the customer is going to drive past a flammable field like in the OP's example.  But if you're going to allow this in the hospital example, I think the hospital doesn't end up keeping any of the liability John was trying to assign them, and maybe even gets rid of all of their current malpractice liability too.

Comment by Dweomite on Principles For Product Liability (With Application To AI) · 2023-12-11T06:16:50.720Z · LW · GW

Who should be second in line for liability (when the actual culprit isn't caught or can't pay) is a more debatable question, I think, but I still do not see any clear reason for a default of assigning it to the product manufacturer.

Your principle 3 says we should assign liability to whoever can most cheaply prevent the problem.  My model says that will sometimes be the manufacturer, but will more often be the victim, because they're much closer to the actual harm.  For instance, it's cheaper to put your valuable heirloom into a vault than it is to manufacture a backpack that is incapable of transporting stolen heirlooms.  Also consider what happens if more than one product was involved; perhaps the thief also wore shoes!

My model also predicts that in many cases both the manufacturer and the victim will have economically-worthwhile mitigations that we'd ideally like them to perform.  I think the standard accepted way of handling situations like that is to attempt to create a list of mitigations that we believe are reasonable for the manufacturer to perform, then presume the manufacturer is blameless if they did those, but give them liability if they failed to do one that appears relevant.  Yes, this is pretty much what you complained about in your malpractice example.  Our "list of reasonable mitigations" will probably not actually be economically optimal, which adds inefficiency, but plausibly less inefficiency than if we applied strict liability to any single party (and thereby removed all incentive for the other parties to perform mitigations).

Comment by Dweomite on Principles For Product Liability (With Application To AI) · 2023-12-10T22:11:22.241Z · LW · GW

In a typical car accident, the manufacturers of the cars involved would be liable for damages. By Principle 2, this would not mean that nobody can realistically sell cars. Instead, the manufacturer would also be the de-facto insurer, and would probably do all the usual things which car insurance companies do.

I bet you cannot find any insurance company that will insure you against willful damage that you caused with your vehicle.

Workers’ comp is a “no-fault” system: rather than any attempt at a determination of responsibility, the employer is simply always liable (except in cases of willful misconduct).

Oh, what a remarkable exception!

I think you're making a vital error by mostly-ignoring the difference between user negligence and user malice.

Changing a design to defend against user error is (often) economically efficient because it's a central change that you make one time and it saves all the users the costs of being constantly careful, which are huge in aggregate, because the product is used a large number of times.

Changing a design to defend against user malice is (often) not economically efficient, because for users to defend against their own malice is pretty cheap (malice requires intent; "don't use this maliciously" arguably has negative cost in effort), while making an in-advance change that defeats malicious intent is very hard (because you have an intelligent adversary, and they can react to you, and you can't react to them).

I think Principle 3 is clearly going to put liability for writing fake reports on the person who deliberately told the AI to write a fake report, rather than on the AI maker.

Additionally, the damage that can be caused by malice is practically unbounded.  This is pretty problematic for a liability regime because a single outlier event can plausibly bankrupt the company even if their long-run expected value is positive.

Comment by Dweomite on The Offense-Defense Balance Rarely Changes · 2023-12-10T05:25:36.429Z · LW · GW

Seems like you expect changes in offense/defense balance to show up in what percentage of stuff gets destroyed.  On my models it should mostly show up in how much stuff exists to be fought over; people won't build valuable things in the first place if they expect them to just get captured or destroyed.

To make that more concrete:


On Cybersecurity:

Computers still sometimes get viruses or ransomware, but they haven’t grown to endanger a large percent of the GDP of the internet.

This seems borderline tautological.  We wouldn't put so much valuable stuff on the Internet if we couldn't (mostly) defend it.

In WW2, when one nation wanted to read another nation's encrypted communications (e.g. Enigma), they'd assemble elite teams of geniuses, and there was a serious fight about it with real doubt about who would win.  A couple centuries before that, you could hire a single expert and have a decent shot at breaking someone's encryption.

Today, a private individual can download open-source encryption software and be pretty confident that no one on earth can break the encryption itself--not even a major government. (Though they might still get you through any number of other opsec mistakes).

This is necessary to make modern e-commerce work; if we hadn't had this massive shift in favor of the defender, we'd have way way less of our economy online.  Note especially that asymmetrical encryption is vital to modern e-commerce, and it was widely assumed to be impossible until its invention in the 1970s; that breakthrough massively favors defenders.

But in that counterfactual world where the offense/defense balance didn't radically shift, you would probably still be able to write that "viruses and ransomware haven't grown to endanger a large percentage of the Internet".  The Internet would be much smaller and less economically-important compared to our current world, but you wouldn't be able to see our current world to compare against, so it would still look like the Internet (as you know it) is mostly safe.


On Military Deaths:

Does anyone have a theory of the offense-defense balance which can explain why the per-capita deaths from war should be about the same in 1640 when people are fighting with swords and horses as in 1940 when they are fighting with airstrikes and tanks?

On my models I expect basically no relation between those variables.  I expect per-capita deaths from war are mostly based on how much population nations are willing to sacrifice before they give up the fight (or stop picking new fights), not on any details of how the fighting works.

In terms of military tactics, acoup claims the offense/defense balance has radically reversed since roughly WW1, with trenches being nearly invincible in WW1 but fixed defenses being unholdable today (in fights between rich, high-tech nations):

The modern system assumes that any real opponent can develop enough firepower to both obliterate any fixed defense (like a line of trenches) or to make direct approaches futile. So armies have to focus on concealment and cover to avoid overwhelming firepower (you can’t hit what you can’t see!); since concealment only works until you do something detectable (like firing), you need to be able move to new concealed positions rapidly. If you want to attack, you need to use your own firepower to fix the enemy and then maneuver against them, rather than punching straight up the middle (punching straight up the middle, I should note, as a tactic, was actually quite successful pre-1850 or so) or trying to simply annihilate the enemy with massed firepower (like the great barrages of WWI), because your enemy will also be using cover and concealment to limit the effectiveness of your firepower (on this, note Biddle, “Afghanistan and the Future of Warfare” Foreign Affairs 82.2 (2003); Biddle notes that even quantities of firepower that approach nuclear yields delivered via massive quantities of conventional explosives were insufficient to blast entrenched infantry out of position in WWI.)

Comment by Dweomite on What I Would Do If I Were Working On AI Governance · 2023-12-09T08:54:03.452Z · LW · GW

That seems like a useful framing.  When you put it like that, I think I agree in principle that it's reasonable to hold a product maker liable for the harms that wouldn't have occurred without their product, even if those harms are indirect or involve misuse, because that is a genuine externality, and a truly beneficial product should be able to afford it.

However, I anticipate a few problems that I expect will cause any real-life implementation to fall seriously short of that ideal:

  1. The product can only justly be held liable for the difference in harm, compared to the world without that product.  For instance, maybe someone used AI to write a fake report, but without AI they would have written a fake report by hand.  This is genuinely hard to measure, because sometimes the person wouldn't have written a fake if they didn't have such a convenient option, but at the same time, fake reports obviously existed before AI, so AI can't possibly be responsible for 100% of this problem.
  2. If you assign all liability to the product, this will discourage people from taking reasonable precautions.  For instance, they might stop making even a cursory attempt to check if reports look fake, knowing that AI is on the hook for the damage.  This is (in some cases) far less efficient than the optimal world, where the defender pays for defense as if they were liable for the damage themselves.  
    In principle you could do a thing where the AI pays for the difference in defense costs plus the difference in harm-assuming-optimal-defense, instead of for actual harm given your actual defense, but calculating "optimal defense" and "harm assuming optimal defense" sounds like it would be fiendishly hard even if all parties' incentives were aligned, which they aren't.  (And you'd have to charge AI for defense costs even in situations where no actual attack occurred, and maybe even credit them in situations where the net result is an improvement to avoid overcharging them overall?)
  3. My model of our legal system--which admittedly is not very strong--predicts that the above two problems are hard to express within our system, that no specific party within our system believes they have the responsibility of solving them, and that therefore our system will not make any organized attempt to solve them.
    For instance, if I imagine trying to persuade a judge that they should estimate the damage a hand-written fake report would have generated and bill the AI company only for the difference in harm, I don't have terribly high hopes of the judge actually trying to do that.  (I am not a legal expert and am least certain about this point.)
Comment by Dweomite on What I Would Do If I Were Working On AI Governance · 2023-12-09T02:43:50.702Z · LW · GW

The sort of cases I’d (low-confidence) expect to pursue relatively early on would be things like:

  • Find a broad class of people damaged in some way by hallucinations, and bring a class-action suit against the company which built the large language model.
  • Find some celebrity or politician who’s been the subject of a lot of deepfakes, and bring a suit against the company whose model made a bunch of them.
  • Find some companies/orgs which have been damaged a lot by employees/contractors using large language models to fake reports, write-ups, etc, and then sue the company whose model produced those reports/write-ups/etc.

The second two--and possibly the first one, but it's hard to tell because it's kinda vague--feel pretty bad to me.  Like, if you ignore the x-risk angle, imagine that current AI is roughly as strong as AI will ever be, and just look at this from a simple product liability angle, then making AI creators liable for those things strikes me as unreasonable and also bad for humanity.  Kinda like if you made Adobe liable for stuff like kids using Photoshop to create fake driver's licenses (with the likely result that all legally-available graphics editing software will suck, forever).

Curious if you disagree and think those would be good liability rules even if AI progress was frozen, or if you're viewing this as a sacrifice that you're willing to make to get a weapon against x-risk?

Comment by Dweomite on On Trust · 2023-12-06T22:50:34.698Z · LW · GW

In security engineering, a trusted component of a system is a component that has the ability to violate the system's security guarantees.  For instance, if a security engineer says "Alice is trusted to guard the cookie jar", that means "Alice has the ability to render the cookie jar unguarded".

I notice that the four examples at the beginning of this post all seem to slot pretty nicely into this definition:

  • "I decided to trust her" => I assigned Alice to guard the cookie jar by herself
  • "Should I trust him?" => Should I allow Bob to guard the cookie jar?
  • "Trust me" => Please allow me to access the cookie jar
  • "They offered me their trust" => They granted me access to the cookie jar

If you think about them as being about security policies, rather than epistemic states, then they seem to make a lot more sense.

I think the layperson's informal concept of "trust" is more muddled than this, and conflates "I'm giving you the power to violate my security" with "I am comfortable with you having the power to violate my security" and maybe some other stuff.

Comment by Dweomite on Redirecting one’s own taxes as an effective altruism method · 2023-12-01T05:14:43.318Z · LW · GW

In my model, every act of societal rule-breaking slightly undermines literally every societal rule (although if the rule in question is bad enough this might be worth it).  So that's a trivial "yes".

If we restrict things to more direct effects, I think most people are realistically going to interpret your policy as "don't pay taxes that I personally don't agree with" rather than "don't pay income taxes in particular, there is something a priori special about income taxes specifically that puts them into a fundamentally different category from all other taxes, this is definitely not a category that I made up retroactively because it happens to be convenient for me in my current circumstances", no matter how much you protest that your real policy is the second thing.  Therefore if they agree with income tax and disagree with Georgist tax, they will think they can ignore Georgist tax and that you will have no right to complain when they do.  So, again, yes.

Comment by Dweomite on Redirecting one’s own taxes as an effective altruism method · 2023-11-14T01:40:58.083Z · LW · GW

I think that if your government is basically a protection racket, extorting resources from its subjects while providing minimal benefits, then refusing to pay taxes seems pretty ethical to me.

But if you think your government is, overall, something that you would rather preserve than destroy, then I think the ethical case for paying taxes is pretty strong.

Most people would say that you shouldn't steal, murder, cheat your business partners, etc. even if you could get away with it and donate the proceeds to charity.  I think the widely-accepted justifications for not doing that are, broadly:

  • Some form of deontology that says you need to follow those basic rules of fair play even if the utility is not great.
  • Consequentialist reasoning that breaking these rules damages society's ability to coordinate around the rule that was broken (and, to a lesser extent, to coordinate around any rules at all), and this societal ability is so immensely valuable that this outweighs other considerations in most realistic scenarios.  (Especially considering that your lying brain will exaggerate the societal benefits of anything it thinks is in your self-interest.)

If you accept either of those arguments for not robbing a store, and you think your government is on-balance good to have around, then I think you should also accept those same arguments for paying your taxes.  If the government is basically legitimate, then evading taxes is pretty similar to theft.  At minimum, it's defecting from a societal rule that is widely considered important and which has been thoroughly ratified by our standard societal-rule-process.

The consequentialist bullet-point above is similar to the ethical objection that you argue against in the OP, but I think you've imagined the commons-being-damaged too narrowly and thus significantly underestimated the value at stake.  For instance, maybe you think our current tax rules are bad, but perhaps you'd like the ability to have and enforce any tax rules at all, even when some of your fellow citizens disagree with you about what rules are ideal.  Will you be able to have that, after establishing that you think it's OK to refuse to pay taxes just because you think the current system is inefficient?

I also think this specific objection you make is very far-fetched:

Why exactly should I expect the rule of law to collapse (rather than for the government to be reformed or replaced) when the consent of the governed wavers: could the results not just as plausibly be positive ones?

Imagine using this argument against one of the more widely-accepted rules I mentioned above, like murder:

Alice:  You shouldn't murder people, even if you could get away with it and the world would be better off without the specific person being murdered, because that would damage society's ability to coordinate around the very important "generally don't murder people" rule.

Bob:  But if that rule loses popular support, why wouldn't it just be replaced by a new rule?  And couldn't that new rule just as easily be an improvement?

Good rules are a small target in possibility-space and it takes work to hit that target.  If you want to get a better rule, you'd better put a lot of effort into coordinating with other people and carefully channeling your force towards that small target.  It seems incredibly naive to me to think you'll get a good rule automatically just because you smash the existing rule.

Additionally, there must be some historical reason that we have the rule we have.  For important, high-profile, long-standing rules (like murder, or taxes), a plausible guess at that reason would be that it was the best rule our predecessors could realistically get.  Unless you have some specific reason for thinking you can do better than them, it seems fairly unlikely that you could get a substantially better rule even with a highly-coordinated effort.

Bob is also imagining this as a clean switch from rule A to rule B, whereas in reality there will probably be a long period (maybe indefinite) when rule A is damaged enough to become less effective but not damaged enough to collapse.

There's also inevitable collateral damage to other rules.

You complained that the objection you were responding to seemed like a rationalization that someone would make up if they had already decided on the answer and wanted a convenient justification.  But this particular counter-objection seems like pure wishful thinking to me.  Yes it's possible to imagine a good outcome, but is that outcome likely?  How much effort are you currently putting into steering towards this hoped-for good outcome?  Will your fellow scofflaws even agree with you about which outcomes would count as "good"?

I think civil disobedience is sometimes a good tactic for protesting a bad rule, but you should have at least a rough proposal for how you want the rule changed and an overall strategy for actually getting that change.  It's also exceedingly suspicious if your "civil disobedience" involves keeping a low profile and putting money into your own pocket, rather than making headlines and going to jail for it.

Comment by Dweomite on Redirecting one’s own taxes as an effective altruism method · 2023-11-13T23:23:16.717Z · LW · GW

In 2022, 8,143,000 federal tax returns were filed in which the filers failed to pay what the returns said they owed. There were also at least 413,000 taxpayers who failed to file returns (only counting the ones the I.R.S. knows about).[4] That same year, the I.R.S. successfully prosecuted 699 people for tax crimes of all sorts.[5] Even if every one of those prosecutions had been of people who merely refused to pay (or to file and pay), that would mean that an individual tax scofflaw would have had something like a 1 in 12,000 chance of being brought up on charges.

I have not checked your sources, but it sounds like the first number you are quoting is probably referring to the people who did not immediately pay their full tax when filing.  Have I misunderstood?

My models predict that the subcategory of people who continued to not pay their tax unless/until criminally convicted is substantially smaller, and that charges are vastly more likely to be brought against this subgroup than against others.  (The larger group also contains:  people who made a honest mistake; people who are legitimately but temporarily unable to pay, and will pay when they can; people who can be intimidated into paying by scary letters; people who have assets that can be easily seized.)

This suggests to me that your "1 in 12,000" number may be a rather substantial underestimate for the risk within the scofflaw subgroup.

Comment by Dweomite on Deception Chess: Game #1 · 2023-11-05T21:35:00.962Z · LW · GW

For variant 1, do you mean you'd give only the dishonest advisors access to an engine, while the honest advisor has to do without?  I'd expect that's an easy win for the dishonest advisors, for the same reason it would be an easy win if the dishonest advisors were simply much better at chess than the honest advisor.

Contrariwise, if you give all advisors access to a chess engine, that seems to me like it might significantly favor the honest advisor, for a couple of reasons:

A.  Off-the-shelf engines are going to be more useful for generating honest advice; that is, I expect the honest advisor will be able to leverage it more easily.

  • The honest advisor can just ask for a good move and directly use it; dishonest advisors can't directly ask for good-looking-but-actually-bad moves, and so need to do at least some of the search themselves.
  • The honest advisor can consult the engine to find counter-moves for dishonest recommendations that show why they're bad; dishonest advisors have no obvious way to leverage the engine at all for generating fake problems with honest recommendations.

(It might be possible to modify a chess engine, or create a custom interface in front of it, that would make it more useful for dishonest advisors; but this sounds nontrivial.)

B.  A lesson I've learned from social deduction board games is that the pro-truth side generally benefits from communicating more details.  Fabricating details is generally more expensive than honestly reporting them, and also creates more opportunities to be caught in a contradiction.

Engine assistance seems like it will let you ramp up the level of detail in your advice:

  • You can give quantitative scores for different possible moves (adding at least a few bits of entropy per recommendation)
  • You can analyze (and therefore discuss) a larger number of options in the same amount of time. (though perhaps you can shorten time controls to compensate)
  • Note that the player can ask advisors for more details than the player has time to cross-check, and advisors won't know which details the player is going to pay attention to, creating an asymmetric burden
Comment by Dweomite on Deception Chess: Game #1 · 2023-11-05T01:44:51.069Z · LW · GW

on a meta level I wonder whether I should have actually been less straightforward in my presentation of what I believed. In theory, there's a difference between optimizing for Alex to win, and being completely honest to Alex, and it might have been better for me to have been more strategic about my presentation. As in, not suggesting suspicious-looking moves like 30. f7, even though I thought they were right. Optimizing in someone's favor by not being completely honest with them sure is a really risky sort of thing to do, and I doubt I really could have pulled it off all that well, but it's something to take into consideration in the real-world AI scenario.

One option to mitigate the risk is to be open about what you're doing.  "I think the best move here is X, but I realize that X looks very suspicious, so I'm going to recommend that you do Y instead in order to hedge against me being dishonest."

Comment by Dweomite on Deception Chess: Game #1 · 2023-11-05T01:33:38.394Z · LW · GW

An honest advisor might say "I still think my recommendation was good, but if you're not willing to do that, then X would be an acceptable alternative."

Comment by Dweomite on What's Hard About The Shutdown Problem · 2023-10-22T05:54:13.241Z · LW · GW

I don't think anyone is saying that "always let the human shut you down" is the Actual Best Option in literally 100% of possible scenarios.

Rather, it's being suggested that it's worth sacrificing the AI's value in the scenarios where it would be correct to defend itself from being shut off, in order to be able to shut it down in scenarios where it's gone haywire and it thinks it's correct to defend itself but it's actually destroying the world.  Because the second class of scenarios seems more important to get right.

Comment by Dweomite on What's Hard About The Shutdown Problem · 2023-10-22T05:50:04.730Z · LW · GW

As I understand it, the shutdown problem isn't about making the AI correctly decide whether it ought to be shut down.  We'd surely like to have an AI that always makes correct decisions, and if we succeed at that then we don't need special logic about shutting down, we can just apply the general make-correct-decisions procedure and do whatever the correct thing is.

But the idea here is to have a simpler Plan B that will prevent the worst-case scenarios even if you make a mistake in the fully-general make-correct-decisions implementation, and it starts making incorrect decisions.  The goal is to be able to shut it down anyway, even when the AI is not equipped to correctly reason out the pros and cons of shutting down.

Comment by Dweomite on What's Hard About The Shutdown Problem · 2023-10-22T01:36:04.639Z · LW · GW

That's my understanding of why it's bad, yes.  The point of the button is that we want to be able to choose whether it gets pressed or not.  If the AI presses it in a bunch of world where we don't want it pressed and stops it from being pressed in a bunch of worlds where we do want it pressed, those are both bad.  The fact that the AI is trading an equal probability mass in both directions doesn't make it any less bad from our perspective.