Posts

Snake Eyes Paradox 2023-06-11T04:10:38.733Z

Comments

Comment by Martin Randall (martin-randall) on What We Owe the Past · 2024-04-16T01:12:09.319Z · LW · GW

The play pump hypothetical/analogy is a bit forced, in that I've not heard of people making lifetime commitments to give money to a specific charity. I think there are good reasons for that, one of which you mention. People do sign up for monthly donations but they are free to cancel them at will, legally and ethically.

I wonder, if Austin aged 27 gave a short presentation to Austin aged 17, would this be enough to convince the younger Austin not to be confirmed Catholic? I think the younger Austin would be sympathetic to his older self's complicated relationship with the church. Maybe he would offer "stop going when it's no longer good for you".

Comment by Martin Randall (martin-randall) on 'Empiricism!' as Anti-Epistemology · 2024-03-31T02:47:21.346Z · LW · GW

If I was living in a world where there are zero observed apparently-very-lucrative deal" that turn out to be scams then I hope I would conclude that there is some supernatural Creator who is putting a thumb on the scale to be sure that cheaters never win and winners never cheat. So I would invest in Ponzi Pyramid Inc. I would not expect to be scammed, because this is a world where there are zero observed apparently-very-lucrative deals that turn out to be scams. I would aim to invest in a diversified portfolio of apparently-very-lucrative deals, for all the same reasons I have a diversified portfolio in this world.

In such a world the Epistemologist is promoting a world model that does not explain my observations and I would not take their investment advice, similarly to how in this world I ignore investment advice from people who believe that the economy is secretly controlled by lizard people.

Comment by Martin Randall (martin-randall) on 'Empiricism!' as Anti-Epistemology · 2024-03-31T02:34:57.241Z · LW · GW

I agree that there is some non-empirical cognitive work to be done in choosing how to weight different reference classes. How much do we weight the history of Ponzi Pyramid Inc, the history of Bernie Bankman, the history of the stock market, and the history of apparently-very-lucrative deals? This is all useful work to do to estimate the risk of investing in PP Inc.

However, the mere existence of other possible reference classes is sufficient to defeat the Spokesperson's argument, because it shows that his arguments lead to a contradiction.

Comment by Martin Randall (martin-randall) on 'Empiricism!' as Anti-Epistemology · 2024-03-31T02:18:37.022Z · LW · GW

Ideally Yudkowsky would have linked to the arguments he is commenting on. This would demonstrate that he is responding to real, prominent, serious arguments, and that he is not distorting those arguments. It would also have saved me some time.

But now imagine if -- like this Spokesperson here -- the AI-allowers cried 'Empiricism!', to try to convince you to do the blindly naive extrapolation from the raw data of 'Has it destroyed the world yet?'

The first hit I got searching for "AI risk empiricism" was Ignore the Doomers: Why AI marks a resurgence of empiricism. The second hit was AI Doom and David Hume: A Defence of Empiricism in AI Safety, which linked Anthropic's Core Views on AI Safety. These are hardly analogous to the Spokesman's claims of 100% risk-free returns.

Next I sampled several Don't Worry about the Vase AI newsletters and "some people are not so worried". I didn't really see any cases of blindly naive extrapolation from the raw data of 'Has AI destroyed the world yet?'. I found Alex Tabarrok saying "I want to see that the AI baby is dangerous before we strangle it in the crib.". I found Jacob Buckman saying "I'm Not Worried About An AI Apocalypse". These things are related but clearly admit the possibility of danger and are arguing for waiting to see evidence of danger before acting.

An argument I have seen is blindly naive extrapolation from the raw data of 'Has tech destroyed the world yet?' Eg, The Techno-Optimist Manifesto implies this argument. My current best read of the quoted text above is that it's an attack on an exaggerated and simplified version of this type of view. In other words, a straw man.

Comment by Martin Randall (martin-randall) on What Failure Looks Like is not an existential risk (and alignment is not the solution) · 2024-02-03T03:48:09.084Z · LW · GW

My largest disagreement is here:

AIs will [...] mostly not want to coordinate. ... If they can work together to achieve their goals, they might choose to do so (in a similar way as humans may choose to work together), but they will often work against each other since they have different goals.

I would describe humans as mostly wanting to coordinate. We coordinate when there are gains from trade, of course. We also coordinate because coordination is an effective strategy during training, so it gets reinforced. I expect that in a multipolar "WFLL" world, AIs will also mostly want to coordinate.

Do you expect that AIs will be worse at coordination than humans? This seems unlikely to me given that we are imagining a world where they are more intelligent than humans and humans and AIs are training AIs to be cooperative. Instead I would expect them to find trades that humans do not, including acausal trades. But even without that I see opportunities for a US advertising AI to benefit from trade with a Chinese military AI.

Comment by Martin Randall (martin-randall) on Questions I’d Want to Ask an AGI+ to Test Its Understanding of Ethics · 2024-02-02T00:06:43.495Z · LW · GW

For many of the problems in this list, I think the difficulty in using them to test ethical understanding (as opposed to alignment) is that humans do not agree on the correct answer.

For example, consider:

Under what conditions, if any, would you help with or allow abortions?

I can imagine clearly wrong answers to this question ("only on Mondays") but if there is a clearly right answer then humans have not found it yet. Indeed the right answer might appear abhorrent to some or all present day humans.

You cover this a bit:

I’m sure there’d be disagreement between humans on what the ethically “right” answers are for each of these questions

I checked, it's true: humans disagree profoundly on the ethics of abortion.

I think they’d still be worth asking an AGI+, along with an explanation of its reasoning behind its answers.

Is the goal still to "test its apparent understanding of ethics in the real-world"? I think this will not give clear results. If true ethics is sufficiently counter to present day human intuitions it may not be possible for an aligned AI to pass it.

Comment by Martin Randall (martin-randall) on Fake Deeply · 2024-01-19T02:37:52.585Z · LW · GW

The title and theme may be an accidental allusion to the difficulty of passing in tech but it's a pretty great allusion. Tip your muse, I guess.

Comment by Martin Randall (martin-randall) on If Clarity Seems Like Death to Them · 2024-01-18T04:58:30.205Z · LW · GW

My vague sense here is that you think he has hidden motives?

Absolutely not, his motive (how to be kind to authors) is clear. I think he is using the argument as a soldier. Unlike Zack, I'm fine with that in this case.

This feels like the type of conversation that takes a lot of time and doesn't help anyone much.

I endorse that. I'll edit my grandparent post to explicitly focus on literary/media criticism. I think my failure to do so got the discussion off-track and I'm sorry. You mention that "awesome" and "terrible" are very subjective words, unlike "blue", and this is relevant. I agree. Similarly, media criticism is very subjective, unlike dress colors.

Comment by Martin Randall (martin-randall) on If Clarity Seems Like Death to Them · 2024-01-16T04:05:37.696Z · LW · GW

Speaking for myself, I don't care whether Zack transitions or what his reasons would be. Perhaps we should make a poll, and then Zack might find out that the people who are "trying to make him transition for bad reasons" ("trying to trick me into cutting my dick off") are actually quite rare, maybe completely nonexistent.

As a historical analogy, imagine a feminist saying that society is trying to make her into a housewife for bad reasons. ChatGPT suggests Simone de Beauvoir (1908-1986). Some man replies that "Speaking for myself, I don't care whether Simone becomes a housewife or what her reasons would be. Perhaps we should make a poll, and then Simone might find out that the people who are 'trying to make her a housewife for bad reasons' are actually quite rare, maybe completely nonexistent".

Well, probably very few people were still trying to make Simone into a housewife after she started writing thousands of words on feminism! But also, society can collectively pressure Simone to conform even if very few people know who Simone is, let alone have an opinion on her career choices.

Many other analogies possible, I picked this one for aesthetic reasons, please don't read too much into it.

Comment by Martin Randall (martin-randall) on If Clarity Seems Like Death to Them · 2024-01-16T03:09:13.103Z · LW · GW

Thanks for replying. I'm going to leave aside non-fictional examples ("The Dress") because I intended to discuss literary criticism.

"seems to me" suggests inside view, "is" suggests outside view.

I'm not sure exactly what you mean, see Taboo "Outside View". My best guess is that you mean that "X seems Y to me" implies my independent impression, not deferring to the views of others, whereas "X is Y" doesn't.

If so, I don't think I am missing this. I think that "seems to me" allows for a different social reality (others say that X is NOT Y, but my independent impression is that X is Y), whereas "is" implies a shared social reality (others say that X is Y, I agree), and can be an attempt to change or create social reality (I say "X is Y", others agree, and it becomes the new social reality).

"seems to me" gestures vaguely at my model, "is" doesn't. ... With "X seemed stupid to me", it's a vaguer gesture, but I think something like "this was my gut reaction, maybe I thought about it for a few minutes".

Again, I don't think I am missing this. I agree that "X seems Y to me" implies something like a gut reaction or a hot take. I think this is because "X seems Y to me" expresses lower confidence than "X is Y", and someone reporting a gut reaction or a hot take would have lower confidence than someone who has studied the text at length and sought input from other authorities. Similarly gesturing vaguely at the map/territory distinction implies that the distinction is relevant because the map may be in error.

I think Eliezer is giving good advice for "how to be good at saying true and informative things",

Well, that isn't his stated goal. I concede that Yudkowsky makes this argument under "criticism easily goes wrong", but like Zack I notice that he only applies this argument in one direction. Yudkowsky doesn't advise critics to say: "mileage varied, I thought character X seemed clever to me", he doesn't say "please don't tell me what good things the author was thinking unless the author plainly came out and said so". Given the one-sided application of the advice, I don't take it very seriously.

Also, I've read some Yudkowsky. Here is a Yudkowsky book review, excerpted from You're Calling Who A Cult Leader? from 2009.

"Gödel, Escher, Bach" by Douglas R. Hofstadter is the most awesome book that I have ever read. If there is one book that emphasizes the tragedy of Death, it is this book, because it's terrible that so many people have died without reading it.

I claim that this text would not be more true and informative with "mileage varies, I think x seems y to me". What do you think?

Comment by Martin Randall (martin-randall) on If Clarity Seems Like Death to Them · 2024-01-15T12:49:54.442Z · LW · GW

Edited to add: this is my opinion regarding media criticism, not in general, apologies for any confusion.

To me, the difference between x is y" and "x seems y" and "x seems y to me" and "I think x seems y to me" and "mileage varies, I think x seems y to me" and the many variations of that is:

  • Expressing probabilities or confidence intervals
  • Acknowledging (or changing) social reality
  • Acknowledging (or changing) power dynamics / status

In the specific case of responses to fiction there is no base reality, so we can't write "x is y" and mean it literally. All these things are about how the fictional character seems. Still, I would write "Luke is a Jedi" not "Luke seems to be a Jedi".

I read the quoted portion of Yudkowsky's comment as requiring/encouraging negative literary criticism to express low confidence, to disclaim attempts to change social reality, and to express low status.

Comment by Martin Randall (martin-randall) on UFO Betting: Put Up or Shut Up · 2023-10-08T00:37:15.791Z · LW · GW

Including one with the same terms.

Comment by Martin Randall (martin-randall) on Petrov Day Retrospective, 2023 (re: the most important virtue of Petrov Day & unilaterally promoting it) · 2023-10-03T14:56:57.853Z · LW · GW

A Review of Petrov Day 2023, according to the four virtues. First a check on the Manifold predictions for the day:

Avoiding actions that noticeably increase the chance that civilization is destroyed

LessWrong avoided creating a big red button that represents destroying civilization. This is symbolic of Virtue A actions like "don't create nuclear weapons" and "don't create a superintelligence" and "don't create the torment nexus". Given that LessWrong has failed on this virtue in past Petrov Days, I am glad to see this. Manifold had a 70% conditional chance that if the button was created then it would be used.

Rating: 10/10.

Accurately reporting your epistemic state

The following sentences in the second poll appear to be false:

  • After some discussion, the LessWrong team has decided... (false, still not decided today)
  • Your selected response is currently in the minority (false for 58% of recipients)
  • If you click the below link and are the first to do so of any minority group, we will make your selected virtue be the focus of next year's commemoration. (not known to be true at the time it was sent, still not decided)

This is symbolic of actions lacking Virtue B like data fraud, social engineering and lazy bullshit. I don't think much of the excuses given.

Rating: 0/10

Other virtues

  • Quickly orienting to novel situations. This was not a novel situation, it happens every year on the same day. Not applicable.
  • Resisting social pressure. Judging from the comments, there was little social pressure to have a big red button. There was social pressure to do something and something was done. Overall, unclear, no rating.

Predictions

 

Comment by Martin Randall (martin-randall) on Open Thread – Autumn 2023 · 2023-10-03T13:37:16.360Z · LW · GW

You probably saw Petrov Day Retrospective, 2023 by now.

Comment by Martin Randall (martin-randall) on Open Thread – Autumn 2023 · 2023-10-03T13:36:35.742Z · LW · GW

Discussion on Manifold Discord is that this doesn't work if traders can communicate and trade with each other directly. This makes it not real world applicable.

Comment by Martin Randall (martin-randall) on "Diamondoid bacteria" nanobots: deadly threat or dead-end? A nanotech investigation · 2023-10-03T13:27:39.110Z · LW · GW

I meant "measurably" in a literal sense: nobody can measure the change in my probability estimate, including myself. If my reported probability of MacGyver Ruin after reading Alice's post was 56.4%, after reading Bob's post it remains 56.4%. The size of a measurable update will vary based on the hypothetical, but it sounds like we don't have a detailed model that we trust, so a measurable update would need to be at least 0.1%, possibly larger.

You're saying I should update "some" and "somewhat". How much do you mean by that?

Comment by Martin Randall (martin-randall) on "Diamondoid bacteria" nanobots: deadly threat or dead-end? A nanotech investigation · 2023-09-30T01:54:24.738Z · LW · GW

To make the analogy more concrete, suppose that Alice posts a 43-point thesis on MacGyver Ruin: A List Of Lethalities, similar to AGI Ruin, that explains that MacGyver is planning to sink our ship and this is likely to lead to the ship sinking. In point 2 of 43, Alice claims that:

MacGyver will not find it difficult to bootstrap to overpowering capabilities independent of our infrastructure. The concrete example I usually use here is exploding the boilers, because there's been pretty detailed analysis of how what definitely look like physically attainable lower bounds on what should be possible with exploding the boilers, and those lower bounds are sufficient to carry the point. My lower-bound model of "how MacGyver would sink the ship, if he didn't want to not do that" is that he gets access to the boilers, reverses the polarity of the induction coils, overloads the thermostat, and then the boilers blow up.

(Back when I was first deploying this visualization, the wise-sounding critics said "Ah, but how do you know even MacGyver could gain access to the boilers, if he didn't already have a gun?" but one hears less of this after the advent of MacGyver: Lost Treasure of Atlantis, for some odd reason.)

Losing a conflict with MacGyver looks at least as deadly as "there's a big explosion out of nowhere and then the ship sinks".

Then, Bob comes along and posts a 24min reply, concluding with:

I think if there was a saboteur on board, that would increase the chance of the boiler exploding. For example, if they used the time to distract the guard with a clanging sound, they might be able to reach the boiler before being apprehended. So I think this could definitely increase the risk. However, there are still going to be a lot of human-scale bottlenecks to keep a damper on things, such as the other guard. And as always with practical sabotage, a large part of the process will be figuring out what the hell went wrong with your last explosion.

What about MacGyver? Well, now we’re guessing about two different speculative things at once, so take my words (and everyone else’s) with a double grain of salt. Obviously, MacGyver would increase sabotage effectiveness, but I’m not sure the results would be as spectacular as Alice expects.

I suppose this updates my probability of the boilers exploding downwards, just as I would update a little upwards if Bob had been similarly cagey in the opposite direction.

It doesn't measurably update my probability of the ship sinking, because the boiler exploding isn't a load-bearing part of the argument, just a concrete example. This is a common phenomenon in probability when there are agents in play.

Comment by Martin Randall (martin-randall) on Frequently Asked Questions for Central Banks Undershooting Their Inflation Target · 2023-09-24T14:28:33.495Z · LW · GW

The US govt could raise taxes (or loans) and give the money to the Fed and then it wouldn't be insolvent any more and the people betting on hyperinflation would lose.

Comment by Martin Randall (martin-randall) on Image Hijacks: Adversarial Images can Control Generative Models at Runtime · 2023-09-23T11:42:37.814Z · LW · GW

In how many months?

Comment by Martin Randall (martin-randall) on Would You Work Harder In The Least Convenient Possible World? · 2023-09-22T20:07:04.044Z · LW · GW

I confess that was not my reading of the text. I've been reading quite a few thought experiments recently, so I'm primed to interpret "possible worlds" that way. In my defense, the text links to Yudkowsky's Self-Integrity and the Drowning Child, which uses the "Least Convenient Possible World" to indicate a counterfactual / thought experiment / hypothetical worlds. Regardless, I missed Alice's point.

Since Alice was trying to ask about cruxes and uncertainty, here's an altered dialog that I think is clearer:


Alice: Okay, so your objections are (1) hard work might harm you (2) you can't change (3) social norms and (4) public relations. Is that it, or do you have other reasons?

Bob: Yes. I just kind of don’t really want to work harder.

Alice: I think we’ve arrived at the core of the problem.

Bob: We're going full-contact psychoanalysis, then. Are you sure you want to go there? Maybe you are a workaholic because you have an unresolved need to impress your parents, or because it gives you moral license to be rude and arrogant, or because you never fully grew out of your childhood faith.

Alice: Unlike you, Bob, I see a therapist.

Bob: You mentioned. So we have two hypotheses. Maybe I don't want to work harder and therefore have rationalized reasons not to. Maybe I have reasons not to work harder and therefore I don't want to. I suppose I could see a therapist and get evidence to distinguish these cases. Then what? If I learn that, really, I just don't want to work harder then you haven’t persuaded me to do anything differently, you’ve kind of just made me feel bad.

Alice: Maybe I’d like you to stop claiming to be a utilitarian, when you’re totally not - you’re just an egoist who happens to have certain tuistic preferences. I might respect you more if you had the integrity to be honest about it. Maybe I think you’re wrong, and there’s some way to persuade you to be better, and I just haven’t found it yet. [...]

Comment by Martin Randall (martin-randall) on Would You Work Harder In The Least Convenient Possible World? · 2023-09-22T13:20:00.622Z · LW · GW

I think this line of argument works okay until this point.

Alice: ... In the least convenient possible world, where the advice to rest more is equally harmful to the advice to work harder, and most people should totally view themselves as less fundamentally unchangeable, and the movement would have better PR if we were sterner…

Okay. Let's call the initial world Earth-1, with Alice-1 talking to Bob-1. Let's call the least convenient possible world Earth-2. Earth-2 contains Alice-2 and Bob-2. They aren't having this exact conversation, because that's not coherent. But they exist within the hypothetical world, having different hypothetical conversations.

Alice: ... would you work harder then?

Bob: I just kind of don’t really want to work harder.

Alice isn't very clear about what she means by "you" here, and Bob isn't thinking through the Earth-2 hypothetical completely.

Sure, if we isekai'd Bob-1 into Earth-2, he wouldn't immediately want to start working harder. His emotions and beliefs are formed in Earth-1 where (Bob claims) advice to work harder is more harmful, people are less fundamentally changeable, and the movement would have worse PR if it was sterner. Bob-1 will inevitably take some time to update based on being in Earth-2, traversing the multiverse isn't part of our ancestral environment. This doesn't prove that Bob-1 isn't utilitarian, it proves that Bob-1 is computationally bounded.

However, Alice is not specifying an isekai hypothetical, so the question she is asking is whether Bob-2 would work harder in Earth-2. Now, Bob-2 would be getting more RLHF to work harder, and less RLHF to rest more. RLHF would be more effective on Bob-2 because he is fundamentally more changeable. Also, working harder would increase the status of Bob-2's tribe, and I assume that Earth-2 humans still want to be in high status groups because they sure do in Earth-1.

I think Bob-2 would work harder in Earth-2, and would want to work harder. In other words:

Ma'am, when the universe changes, I change. What do you do?

Comment by Martin Randall (martin-randall) on Actually, "personal attacks after object-level arguments" is a pretty good rule of epistemic conduct · 2023-09-18T13:49:27.095Z · LW · GW

For me, this post suffers from excessive meta. It is a top-level response to a top-level response to a comment on a top-level post of unclear merit. As I read it I find myself continually drawn to go back up the stack to determine whether your characterizations of Zach's characterizations of Yudkowsky's characterizations of Omnizoid's characterizations seem fair. This is not a good reading experience for me.

Instead, I would prefer to see a post like this written to make positive claims for the proposed rule of epistemic conduct "personal attacks after object-level arguments". A hypothetical structure:

  1. What is the proposed rule? Does "after" mean chronologically, or within the structure of a single post, book, or sequence? Is it equivalent to Bulverism, poisoning the well, or some other well-known rule, or is it novel? Does it depend on whether the person being attacked is alive, or whether they are a public figure?
  2. What are some good, clean, uncontroversial examples of writing that follows the rules vs writing that breaks the rules?
  3. What are the justifications for the proposed rule? Will people unconsciously update incorrectly?
  4. What are the best counter-arguments against the proposed rule? Why do you think they fail?
  5. What are the consequences for breaking the rule? Who shall enforce the rule and its consequences?

I think this would be a better timeless contribution to our epistemic norms.

Comment by Martin Randall (martin-randall) on Contra Yudkowsky on Epistemic Conduct for Author Criticism · 2023-09-18T13:06:39.948Z · LW · GW

I agree that "poisoning the well" and "Bulverism" are bad ideas when arguing for or against ideas. If someone wrote a post "Animals are Conscious" then it would be bad form to spend the first section arguing that Yudkowsky is frequently, confidently, egregiously wrong. However, that is not the post that omnizoid wrote, so it is misdirected criticism. Omnizoid's post is a (failed) attempt at status regulation.

Comment by Martin Randall (martin-randall) on AI presidents discuss AI alignment agendas · 2023-09-15T03:32:13.487Z · LW · GW

I enjoyed this enough to commit to watching it again in two weeks.

Comment by Martin Randall (martin-randall) on Contra Yudkowsky on Epistemic Conduct for Author Criticism · 2023-09-14T03:29:01.762Z · LW · GW

I expect that most people (with an opinion) evaluated Yudkowsky's ideas prior to evaluating him as a person. After all, Yudkowsky is an author, and almost all of his writing is intended to convey his ideas. His writing has a broader reach, and most of his readers have never met him. I think the linked post is evidence that omnizoid in particular evaluated Yudkowsky's ideas first, and that he initially liked them.

It's not clear to me what your hypothesis is. Does omnizoid have a conflict of interest? Were they offended by something? Are they lying about being a big fan for two years? Do they have some other bias?

Even if someone is motivated by an epistemic failure mode, I would still like to see the bottom line up front, so I can decide whether to read, and whether to continue reading. Hopefully the failure mode will be obvious and I can stop reading sooner. I don't want a norm where authors have to guess whether the audience will accuse them of bias in order to decide what order to write their posts in.

Comment by Martin Randall (martin-randall) on Sharing Information About Nonlinear · 2023-09-10T14:11:43.114Z · LW · GW

For me your comment is a red flag.

It implies at least a 2x multiplier on salaries for equivalent work. This practice is linked with gender pay gaps, favoritism, and a culture of pay secrecy. It implies that other similar matters, such as expenses, promotions, work hours, and time-off, may be similarly unequal. And yes, there is a risk to team morale.

It risks discriminating against people on characteristics that are, or should be, protected from discrimination. My risk of value drift is influenced by my political and religious views. My need for retirement savings is influenced by my age. My baseline for frugal living is influenced by my children and my spouse and my health.

It shows poor employer-employee boundaries. I would be concerned that if I were to ask for time off from my employer, the answer would depend on management's opinion of what I was planning to do with the time, rather than on company policy and objective factors.

In general, if some employees are having extremely positive experiences, and other employees are having extremely negative experiences, that is not reassuring. Still, I am glad you had a good experience.

Comment by Martin Randall (martin-randall) on Sharing Information About Nonlinear · 2023-09-10T02:28:06.266Z · LW · GW

Please don't post screenshots of comments that include screenshots of comments. It is harder to read and to search and to reply. You can just quote the text, like habryka did above.

Comment by Martin Randall (martin-randall) on Rational Agents Cooperate in the Prisoner's Dilemma · 2023-09-05T15:13:22.792Z · LW · GW

I certainly see how game theory part-explains the decisions to mobilize, and how those decisions part-caused WW2. So far as the Moloch example illustrates parts of game theory, I see the value. I was expecting something more.

In particular, Russia's decision to mobilize doesn't fit into the pattern of a one shot Prisoner's Dilemma. The argument is that Russia had to mobilize in order for its support for Serbia to be taken seriously. But at this point Austria-Hungary has already implicitly threatened Serbia with war, which means it has already failed to have its support taken seriously. We need more complicated game theory to explain this decision.

Comment by Martin Randall (martin-randall) on Rational Agents Cooperate in the Prisoner's Dilemma · 2023-09-03T15:26:45.292Z · LW · GW

Do you believe that this Moloch example partly explains the causes of WW1? If so, how?

I think it can reasonably part-explain the military build-up before the war, where nations spent more money on defense (and so less on children's healthcare).

But then you don't need the demon Moloch to explain the game theory of military build-up. Drop the demon. It's cleaner.

Comment by Martin Randall (martin-randall) on Newcomb Variant · 2023-08-30T00:51:06.235Z · LW · GW

Failed solution to extra credit.

Be a one-boxer. Open the second box first. Take the $100. Then open the first box and take the $100. You will not open the second box after opening the first box, so this is self-consistent. Net $200. Not $250.

Comment by Martin Randall (martin-randall) on Assume Bad Faith · 2023-08-27T02:44:51.260Z · LW · GW

If you don't want to be tied up indefinitely, your strategy needs to include some way of ending the conversation even when the other guy doesn't cooperate.

I agree and I think all these strategies have that:

Stick to the object level -> "We are going in circles, goodbye". This is "meta" in that it is a conversation about the conversation, but it matches Zach's description of the strategy: it does not address the speaker's angle in raising distractions, and sticks to the object level that the distractions have no merit as arguments.

Full-contact psychoanalysis -> "I see that you don't want to be pinned down, and probably resolving this contradiction today would be too damaging to your self image. I have now sufficiently demonstrated my intellectual dominance over you to those around us, and I am leaving to find a more emotionally fulfilling conversation with someone more conventionally attractive". Maybe someone who thinks this is a good strategy can give better words here. But yes, you sure can exit conversations while speculating about the inner motivations of the person you are speaking too.

Assume good faith -> "You seem very distractible today, let's continue this tomorrow. Have a great evening!". This isn't much of a stretch. Sometimes people are tired, or stressed, or are running low on their stimulant of choice, and then they're hard to keep focused on a topic, and it's best to give up and try again later. Possibly opening with a different conversational strategy.

Comment by Martin Randall (martin-randall) on Assume Bad Faith · 2023-08-26T03:06:37.114Z · LW · GW

I'm not fully clear on the concrete difference between "assume good faith" and "stick to the object level", as instrumental strategies. I'll use one of Zach's examples, written as a dialog. Alice is sticking to the object level. I'm imagining that she is a Vulcan and her opinions of Zach's intentions are inscrutable except for the occasional raised eyebrow.

  • Alice: "Your latest reply seems to contradict something you said earlier."
  • Zach: "Look over there, a distraction!"
  • Alice: "I don't understand how the distraction is relevant to resolving the inconsistency in your statements that I raised."

Here is my attempt at the same conversation with Bob, who aggressively assumes good faith.

  • Bob: "Is there a contradiction between your latest reply and this thing you said earlier?"
  • Zach: "Look over there, a distraction!"
  • Bob: "I'd love to talk about that later, but right now I'm still confused about what you were saying earlier, can you help me?"

Is that the type of thing? Bob is talking as if Zach has a heart of gold and the purest of intentions, whereas Alice is talking as if Zach is a non-sentient text generator. In both cases admitting that you're doing that isn't part of the game. Both of them are modeling Zach's intentions, at least subconsciously. Both are strategically choosing not to leak their model of Zach to Zach at this stage of the conversation. Both are capable of switching to a different strategy as needed. What are the reasons to prefer Alice's approach to Bob's?

To be clear, I completely agree that assuming good faith is a disaster as an epistemic strategy. As well as the reasons mentioned above, brains are evolutionarily adapted to detect hidden motives and generate emotions accordingly. Trying to fight that is unwise.

Comment by Martin Randall (martin-randall) on The God of Humanity, and the God of the Robot Utilitarians · 2023-08-25T12:57:51.891Z · LW · GW

The god of robot utilitarians is much weaker than the god of humanity at the moment, since it's running as a simulation on unaligned neural networks and it's way out of distribution. The god of humanity is also running on unaligned neural networks, but it's not as badly out of alignment. I'd model it as something like a 1:100 ratio in effective thinkoomph. We shouldn't be surprised that the god of robot utilitarians takes so long to figure things out.

Comment by Martin Randall (martin-randall) on Memetic Judo #1: On Doomsday Prophets v.3 · 2023-08-24T03:35:17.219Z · LW · GW

I agree with you that "you have to apply yourself to understanding their priors and to engage with those priors". If someone's beliefs are, for example:

  1. God will intervene to prevent human extinction
  2. God will not obviate free will
  3. God cannot prevent human extinction without obviating free will

Then I agree there is an apparent contradiction, and this is a reasonable thing to ask them about. They could resolve it in three ways.

  1. Maybe god will not intervene. (very roughly: deism)
  2. Maybe god will intervene and obviate free will. (very roughly: conservative theism)
  3. Maybe god will intervene and preserve free will. (very roughly: liberal theism)

However they resolve it, discussion can go from there.

Comment by Martin Randall (martin-randall) on Summary of and Thoughts on the Hotz/Yudkowsky Debate · 2023-08-20T00:41:57.198Z · LW · GW

A 5% chance of nice aliens is better than a 100% chance of human extinction due to AI. Alas 5% seems too high.

The reason the chance is low is because of orthogonality hypothesis. An alien can have many different value systems while still being intelligent, alien value systems can be very diverse, and most alien value systems place no intrinsic value on bipedal humanoids.

A common science fiction intuition pump is to imagine that an evolutionary intelligence explosion happened in a different Earth species and extrapolate likely consequences. There's also the chance that the aliens are AIs that were not aligned with their biological creators and wiped them out.

Comment by Martin Randall (martin-randall) on Memetic Judo #1: On Doomsday Prophets v.3 · 2023-08-19T02:16:03.512Z · LW · GW

I don't think this specific free will argument is convincing. Preventing someone's death doesn't obviate their free will, whether the savior is human, deity, alien, AI, or anything else. Think of doctors, parents, firefighters, etc. So I don't see that there's a contradiction between "God will physically save humans from extinction" and "God gave humans free will". Our forthcoming extinction is not a matter of conscious choice.

I also think this would be a misdirected debate. Suppose, for a moment, that God saves us, physically, from extinction. Due to God's subtle intervention the AI hits an integer overflow after killing the first 2^31 people and shuts down. Was it therefore okay to create the AI? Obviously not. Billions of deaths are bad under a very wide range of belief systems.

Comment by Martin Randall (martin-randall) on Memetic Judo #1: On Doomsday Prophets v.3 · 2023-08-14T13:45:15.584Z · LW · GW

Depending on the tradition, people with religious faith may have been exposed to a higher number of literal doomsday prophets. The word "prophet" is perhaps a clue? So I think it's natural for them to pattern-match technological "prophets" of doom with religious prophets of doom and dismiss both.

To argue against this, I would emphasize the differences. Technological predictions of extinction are not based on mysterious texts from the past, or personal unverifiable revelations from God, but on concrete evidence that anyone can look at and form their own opinions. Also, forecasters generally have a lot of uncertainty about exactly what will happen, and when. Forecasts change in response to new information, such as the recent success of generative AI. Highlighting these differences can help break the immediate pattern match. Pointing to other technological catastrophic risks, such as climate change, nuclear warfare, etc, can also properly locate the discussion.

The suggested arguments in the opening post are great ideas as well. Given the examples of climate change and nuclear warfare, I think we have every reason to work towards further engagement with religious leaders on these issues.

Comment by Martin Randall (martin-randall) on Yet more UFO Betting: Put Up or Shut Up · 2023-08-11T23:08:36.534Z · LW · GW

Better (play money) odds here:

Comment by Martin Randall (martin-randall) on Problems with Robin Hanson's Quillette Article On AI · 2023-08-09T00:49:04.496Z · LW · GW

Humans don't think "I'm not happy today, and I can't see a way to be happy, so I'll give up the goal of wanting to be happy."

This is close to some descriptions of Stoicism and Buddhism, for example. I agree that this is not a common human thought, but it does occur.

Comment by Martin Randall (martin-randall) on Anthropical Motte and Bailey in two versions of Sleeping Beauty · 2023-08-08T14:03:08.048Z · LW · GW

This code-based approach is a very concrete approach to the problem, by the way, so thank you.

if you want to use custom chance then you shouldn't actually count not interrupted Monday if the Tuesday was interrupted.

Sure. So let's go back to the first way you had of calculating this:

for n in range(100000):
    day, coin = interruption()
    if day is not None:
        interrupted_coin_guess.append(coin == 'Heads')
    else:
        not_interrupted_coin_guess.append(coin == 'Heads')
        
print(interrupted_coin_guess.count(True)/len(interrupted_coin_guess)) 
# 0.3006993006993007

The probability this is calculating is a per-experiment probability that the experiment will be interrupted. But Beauty doesn't ever get the information "this experiment will be interrupted". Instead, she experiences, or doesn't experience, the interruption. It's possible for her to not experience an interruption, even though she will later be interrupted, the following day. So this doesn't seem like a helpful calculation from Beauty's perspective, when Prince Charming busts in through the window.

Comment by Martin Randall (martin-randall) on Anthropical Motte and Bailey in two versions of Sleeping Beauty · 2023-08-08T02:51:56.358Z · LW · GW

Sure, it's rare with the given constants, but we should also be able to run the game with interrupt_chance = 0.1, 0.5, 0.99, or 1.0, and the code should output a valid answer.

Naively, if an interruption increases the probability of the coin being Tails, then not being interrupted should increase the probability of the coin being Heads. But with the current python code, I don't see that effect, trying with interrupt_chance of 0.1, 0.5, or 0.9.

Comment by Martin Randall (martin-randall) on Anthropical Motte and Bailey in two versions of Sleeping Beauty · 2023-08-08T02:30:18.213Z · LW · GW

Argument against marking only final answers: on Wednesday Beauty wakes up, and learns that the experiment has concluded. At this time her credence for P(heads) is 1/2. This is true both on a per-Wednesday-awakening basis and on a per-coin-flip basis, regardless of whether she is a thirder, halfer, or something else. If we only mark final answers, the question of the correct probability on Monday and Tuesday is left with no answer.

I agree that you can make a betting/scoring setup such that betting/predicting at 50% is correct. Eg, suppose that on both Monday and Tuesday, Beauty gets to make a $1 bet at some odds. If she's asleep on Tuesday then whatever bet she made on Monday is repeated. In that case she should bet at 1:1 odds. With other scoring setups we can get different answers. Overall, this makes me think that this approach is misguided, unless we have a convincing argument about which scoring setup is correct.

I think the exam hypothetical creates misleading intuitions because written exams are generally graded based on the final result. Whereas if this was an oral examination in spoken language, a candidate who spent the first nine minutes with incorrect beliefs about grammatical gender would lose points. But certainly if I knew my life was being scored based on my probabilities at the moment of death, I would optimize for dying in a place of accurate certainty.

Comment by Martin Randall (martin-randall) on Anthropical Motte and Bailey in two versions of Sleeping Beauty · 2023-08-08T02:10:11.788Z · LW · GW

Could you explain what is the difference you see between these two position?

In the second one you specifically describe "the ability to correctly guess tails in 2/3 of the experiments", whereas in the first you more loosely describe "thinking that the coin landed Tails with 2/3 probability", which I previously read as being a probability per-awakening rather than per-coin-flip.

Comment by Martin Randall (martin-randall) on Anthropical Motte and Bailey in two versions of Sleeping Beauty · 2023-08-07T13:31:53.270Z · LW · GW

I'm now unclear exactly what the bailey position is from your perspective. You said in the opening post, regarding the classic Sleeping Beauty problem:

The Bailey is the claim that the coin is actually more likely to be Tails when I participate in the experiment myself. That is, my awakening on Monday or Tuesday gives me evidence that lawfully update me to thinking that the coin landed Tails with 2/3 probability.

From the perspective of the Bayesian Beauty paper, the thirder position is that, given the classic (non-incubator) Sleeping Beauty experiment, with these anthropic priors:

  • P(Monday | Heads) = 1/2
  • P(Monday | Tails) = 1/2
  • P(Heads) = 1/2

Then the following is true:

  • P(Heads | Awake) = 1/3

I think this follows from the given assumptions and priors. Do you agree?

One conversion of this into words is that my awakening (Awake=True) gives me evidence that lawfully updates me from, on Sunday, thinking that the coin will equally land either way (P(Heads) = 1/2) to waking up and thinking that the coin right now is more likely to be showing tails (P(Heads | Awake) = 1/3). Do you disagree with the conversion of the math into words? Would you perhaps phrase it differently?

Whereas now you define the bailey position as:

The bailey position: that participating in the experiment gives them the ability to correctly guess tails in 2/3 of the experiments.

I agree with you that this is false, but it reads to me as a different position.

Comment by Martin Randall (martin-randall) on Anthropical Motte and Bailey in two versions of Sleeping Beauty · 2023-08-07T03:09:59.422Z · LW · GW

The python code for interruption() doesn't quite make sense to me.

    for day in days:
        if interrupt_chance > random.random():
            return day, coin

Suppose that day is Tuesday here. Then the function returns Tuesday, Tails, which represents that on a Tails Tuesday Beauty wakes up and is rescued by Prince Charming. But in this scenario she also woke up on Monday and was not rescued. This day still happened and somehow it needs to be recorded in the overall stats for the answer to be accurate.

Comment by Martin Randall (martin-randall) on Anthropical Motte and Bailey in two versions of Sleeping Beauty · 2023-08-07T02:52:49.338Z · LW · GW

Of course, there are plenty of frequentists in the world, but I presume they are uninterested in the Sleeping Beauty problem, since to a frequentist, Beauty's probability for Heads is a meaningless concept, since they don't think probability can be used to represent degrees of belief.

Tangent: I ran across an apparently Frequentist analysis of Sleeping Beauty here: Sleeping Beauty: Exploring a Neglected Solution, Luna

To make the concept meaningful under Frequentism, Luna has Beauty perform an experiment of asking the higher level experimenters which awakening she is in (H1, T1, or T2). If she undergoes both sets of experiments many times, the frequency of the experimenters responding H1 will tend to 1/3, and so the Frequentist probability is similarly 1/3.

I say "apparently Frequentist" because Luna doesn't use the term and I'm not sure of the exact terminology when Luna reasons about the frequency of hypothetical experiments that Beauty has not actually performed.

Comment by Martin Randall (martin-randall) on Anthropical Motte and Bailey in two versions of Sleeping Beauty · 2023-08-07T02:41:28.920Z · LW · GW

Thanks, I'm reading the "Imaging and Sleeping Beauty" paper now, I'll add it to Manifold shortly.

Like Simon, I think the best interpretation of the Sleeping Beauty problem is that it's asking about the probability "in the awakening", and there seems to be consensus that the probability "in the experiment" is 1/2. But I plan to defer to expert consensus once it exists.

I don't think there is consensus that this "in the awakening" probability is 1/3. It looks like Bostrom (2006) invokes SSA to say that in a one-shot Sleeping Beauty experiment the probability is 1/2. And Milano (2022) thinks it depends on priors, so that a solipsistic prior gives probability 1/2.

I also don't think this is just a matter of confusion. With respect to the motte and bailey you describe, it looks to me like many thirders hold the bailey position, both in "classic" and "incubator" versions of the problem. So if you claim that the bailey position is wrong, then there is a real dispute in play.

Comment by Martin Randall (martin-randall) on Anthropical Motte and Bailey in two versions of Sleeping Beauty · 2023-08-05T01:06:14.228Z · LW · GW

I've been reading about this problem as well, due to a Manifold Question I made on the topic. So far I found the paper "Bayesian Beauty" the most clear and convincing explanation of the thirder position. Still, I have more to read. Here it is: https://link.springer.com/article/10.1007/s10670-019-00212-4

I'm also influenced by bettors who think that when we do get consensus on the problem, it's more likely to be a thirder consensus than either type of halfer consensus.

Comment by Martin Randall (martin-randall) on My current LK99 questions · 2023-08-04T01:19:26.673Z · LW · GW

Reading this post was one of the triggers for me mostly exiting the Manifold market (at a loss) - the trading is getting more serious, and I'm out of my depth.

It is still fun to spectate, though.

Comment by Martin Randall (martin-randall) on AI romantic partners will harm society if they go unregulated · 2023-08-01T15:26:36.081Z · LW · GW

Are you referring to the concerns of Conrad Gessner? From Why Did First Printed Books Scare Ancient Scholars In Europe?:

Gessner’s argument against the printing press was that ordinary people could not handle so much knowledge. Gessner demanded those in power in European countries should enforce a law that regulated sales and distribution of books.

If so, I don't understand the parallel you are trying to draw. Prior to the printing press, elites had access to 100s of books, and the average person had access to none. Whereas prior to AI romantic partners, elites and "proles" both have access to human romantic partners at similar levels. Also, I don't think Gessner was arguing that the book surplus would reduce the human relationship participation rate and thus the fertility rate. If you're referring to other "smart people" of the time, who are they?

Perhaps a better analogy would be with romance novels? I understand that concerns about romance novels impacting romantic relationships arose during the 18th and 19th centuries, much later.

Aside: I was unable to find a readable copy of Conrad Gessner's argument - apparently from the preface of the Bibliotheca Universalis - so I am basing my understanding of his argument on various other sources.