Posts

Specialized Labor and Counterfactual Compensation 2020-11-14T18:13:43.044Z
Against boots theory 2020-09-14T13:20:04.056Z
Classifying games like the Prisoner's Dilemma 2020-07-04T17:10:01.965Z
Short essays on various things I've watched 2020-06-12T22:50:01.957Z
Haskenthetical 2020-05-19T22:00:02.014Z
Chris Masterjohn on Coronavirus, Part 2 2020-04-28T21:50:01.430Z
In my culture: the responsibilities of open source maintainers 2020-04-13T13:40:01.174Z
Chris Masterjohn on Coronavirus, Part 1 2020-03-29T11:00:00.819Z
My Bet Log 2020-03-19T21:10:00.929Z
Tapping Out In Two 2019-12-05T23:10:00.935Z
The history of smallpox and the origins of vaccines 2019-12-01T20:51:29.618Z
The Effect pattern: Transparent updates in Elm 2019-10-20T20:00:01.101Z
London Rationalish meetup (part of SSC meetups everywhere) 2019-09-12T20:32:52.306Z
Is this info on zinc lozenges accurate? 2019-07-27T22:05:11.318Z
A reckless introduction to Hindley-Milner type inference 2019-05-05T14:00:00.862Z
"Now here's why I'm punching you..." 2018-10-16T21:30:01.723Z
Pareto improvements are rarer than they seem 2018-01-27T22:23:24.206Z
2017-10-08 - London Rationalish meetup 2017-10-04T14:46:50.514Z
Authenticity vs. factual accuracy 2016-11-10T22:24:38.810Z
Costs are not benefits 2016-11-03T21:32:07.811Z
GiveWell: A case study in effective altruism, part 1 2016-10-14T10:46:23.303Z
Six principles of a truth-friendly discourse 2016-10-08T16:56:59.994Z
Diaspora roundup thread, 23rd June 2016 2016-06-23T14:03:32.105Z
Diaspora roundup thread, 15th June 2016 2016-06-15T09:36:09.466Z
The Sally-Anne fallacy 2016-04-11T13:06:10.345Z
Meetup : London rationalish meetup - 2016-03-20 2016-03-16T14:39:40.949Z
Meetup : London rationalish meetup - 2016-03-06 2016-03-04T12:52:35.279Z
Meetup : London rationalish meetup, 2016-02-21 2016-02-20T14:09:42.635Z
Meetup : London Rationalish meetup, 7/2/16 2016-02-04T16:34:13.317Z
Meetup : London diaspora meetup: weird foods - 24/01/2016 2016-01-21T16:45:10.166Z
Meetup : London diaspora meetup, 10/01/2016 2016-01-02T20:41:05.950Z
Stupid questions thread, October 2015 2015-10-13T19:39:52.114Z
Bragging thread August 2015 2015-08-01T19:46:45.529Z
Meetup : London meetup 2015-05-14T17:35:18.467Z
Group rationality diary, May 5th - 23rd 2015-05-04T23:59:39.601Z
Meetup : London meetup 2015-05-01T17:16:12.085Z
Cooperative conversational threading 2015-04-15T18:40:50.820Z
Open Thread, Apr. 06 - Apr. 12, 2015 2015-04-06T14:18:34.872Z
[LINK] Interview with "Ex Machina" director Alex Garland 2015-04-02T13:46:56.324Z
[Link] Eric S. Raymond - Me and Less Wrong 2014-12-05T23:44:57.913Z
Meetup : London social meetup in my flat 2014-11-19T23:55:37.211Z
Meetup : London social meetup 2014-09-25T16:35:18.705Z
Meetup : London social meetup 2014-09-07T11:26:52.626Z
Meetup : London social meetup - possibly in a park 2014-07-22T17:20:28.288Z
Meetup : London social meetup - possibly in a park 2014-07-04T23:22:56.836Z
How has technology changed social skills? 2014-06-08T12:41:29.581Z
Meetup : London social meetup - possibly in a park 2014-05-21T13:54:16.372Z
Meetup : London social meetup - possibly in a park 2014-05-14T13:27:30.586Z
Meetup : London social meetup - possibly in a park 2014-05-09T13:37:19.129Z
May Monthly Bragging Thread 2014-05-04T08:21:17.681Z

Comments

Comment by philh on "If" is in the map · 2021-01-15T14:25:35.192Z · LW · GW

Stipulate that we want to choose some boolean function on two inputs to represent "if , then ". Then we want or what are we even doing with our lives.

So we have four choices for how to define and . The standard choice says both are true. What about other choices?

Suppose we have . Then we've translated "if , then " into something that asserts and maybe (depending on ) also asserts . That seems like a bad translation. So let's say . (Notably, if we add "undefined", then has the same problem, when and are defined it lets be true only if is true.) Without this, we can't really translate "if , then . Not . Therefore, not ," because is a contradiction. (Which does still give us , but it also gives us , so.)

All that's left is . If we say this is false, then our "if , then " is translated into something that means " and are either both true, or both false", or . That seems like a bad translation too.

Basically, "if , then " just doesn't translate very well into boolean logic, but all the other ways to translate it seem worse.

Comment by philh on Pseudorandomness contest: prizes, results, and analysis · 2021-01-15T13:59:31.602Z · LW · GW

I was string #54, and scored -1.7 in round 2, fairly middle of the pack. My probabilities:

  • 24 × 0%, 3 were real (13%)
  • 4 × 5%, 0 were real (0%)
  • 5 × 10%, 3 were real (60%)
  • 4 × 15%, 3 were real (75%) (!)
  • 2 × 20%, 1 was real (50%)
  • 7 × 25%, 1 was real (14%)
  • 8 × 30%, 7 were real (88%) (!)
  • 14 × 75%, 10 were real (71%)
  • 40 × 80%, 24 were real (60%)
  • 16 × 85%, 10 were real (63%)

I think the thing I found difficult was I didn't know how to do a bayesian update. I can calculate the probability of a statistic under the "random" hypothesis, but I had no idea what probability to assign it under the "fake" hypothesis, so what next?

In the end I just sorted the strings by a certain statistic and went with my gut, taking into account other info like "number of 1s" and "was this my own string", and starting with a baseline of "given the number of fakes I'm pretty sure I've eliminated, I don't think I can be more than, uh, about 85% confident that any given string is real". I worked from both ends and tried to get my average probability somewhere close to 50%, but that left a lot of space in the middle that I didn't end up using. Clearly this didn't work very well.

(The statistic was something like, probability of the observed "longest run length" combined with "number of changes from 1→0 or 0→1". The two aren't independent, so I got longest run length in closed form and did many simulations to get a rough distribution for number of changes given that.)

Comment by philh on The Amish, and Strategic Norms around Technology · 2021-01-10T21:37:28.892Z · LW · GW

Some other things that come to mind:

  • Dragon Army previously tried the thing this post recommends trying. I don't know quite what to make of this; it seems like at least weak evidence that social engineering is hard.
  • I've seen discussion about whether MAPLE is harmful to its members' epistemics/ability-to-interact-with-the-world (despite being at least rat-adjacent). I don't have a strong object-level opinion on that. And even if it's true, that doesn't mean we can't take good ideas from them. But it might mean we want to be careful about it?
Comment by philh on The Amish, and Strategic Norms around Technology · 2021-01-10T21:36:47.034Z · LW · GW

I like this post a lot.

I'm noticing an unspoken assumption: that Amish culture hasn't changed "much" since the 1800s. If that's not the case... it's not that anything here would necessarily be false, but it would be an important omission.

Like, taking this post as something that it's not-quite but also not-really-not, it uses the Amish as an example in support of a thesis: "cultural engineering is possible". You can, as a society, decide where you want your society to go and then go there. The Amish are an existence proof, and Ray bounces from them to asking how others can do it? What can we use from the Amish, and what is unlikely to work because the Amish had certain advantages we lack?

But this only really makes sense if the Amish managed to steer their culture successfully. If they put a bunch of effort into social engineering and got random results, such that their society is now different from broader US society but also different from what they started with, they don't tell us much. Or, different from what they started with might be fine, but we'd want it to be mostly deliberately different.

(If the Amish are mostly just trying to keep their society the same, that seems like another advantage they had that the post doesn't mention. Staying still seems easier than moving in a specific direction.)

So, in what ways is Amish culture different since the 1800s? Have changes been deliberate (like "it would be good to change like this"), or forced (like "we can't stay the same in the face of X, we need to make one of this set of changes, we choose this one"), or accidental (like "whoops suddenly we have a completely different opinion on some important question")? What would the Amish of each (say) 50-year time period think of the Amish from the next one? 

I think if I tried to answer these questions here, this review would never get posted. I don't know much about the Amish myself. But they seem worth flagging. Depending on the answer to them, the post might turn out to have important omissions.

Thanks to Jacob Lagerros for comments on this review

Comment by philh on Uninformed Elevation of Trust · 2020-12-29T00:30:35.873Z · LW · GW

Roll to disbelieve.

The value of specific examples is that we can check whether the critics seem to know what they're talking about, both in their field (do they understand the ground truth) and regarding the sequences themselves (do they know what Eliezer is saying). Simply telling us there are many examples does not, I believe, fulfill the intent of the question. Which is fine, you have no obligation to answer it, but I think it's worth pointing out.

To be clear, I'm sure you can find people in each of those groups making reach of those criticisms. I do not believe those criticisms would be consensus in each of those groups. Certainly not all of them and on the level of "this is terrible". I remember, for example, physicists talking about the quantum mechanics sequence like "yeah, it's weirdly presented and I don't agree with the conclusion, but the science is basically accurate".

Comment by philh on Rule Thinkers In, Not Out · 2020-12-28T19:30:34.120Z · LW · GW

I think I agree with the thrust of this, but I think the comment section raises caveats that seem important. Scott's acknowledged that there's danger in this, and I hope an updated version would put that in the post.

But also...

Steven Pinker is a black box who occasionally spits out ideas, opinions, and arguments for you to evaluate. If some of them are arguments you wouldn’t have come up with on your own, then he’s doing you a service. If 50% of them are false, then the best-case scenario is that they’re moronically, obviously false, so that you can reject them quickly and get on with your life.

This seems like a strange model to use. We don't know, a priori, what % are false. If 50% are obviously false, probably most of the remainder are subtly false. Giving me subtly false arguments is no favor.

Scott doesn't tell, us, in this essay, what Steven Pinker has given him / why Steven Pinker is ruled in. Has Steven Pinker given him valuable insights? How does Scott know they're valuable? (There may have been some implicit context when this was posted. Possibly Scott had recently reviewed a Pinker book.)

Given Anna's example,

Julia Galef helpfully notes a case where Steven Pinker straightforwardly misrepresents basic facts about who said what. This is helpful to me in ruling out Steven Pinker as someone who I can trust not to lie to me about even straightforwardly checkable facts.

I find myself wondering, has Scott checked Pinker's straightforwardly checkable facts?

I wouldn't be surprised if he has. The point of these questions isn't to say that Pinker shouldn't be ruled in, but that the questions need to be asked and answered. And the essay doesn't really acknowledge that that's actually kind of hard. It's even somewhat dismissive; "all you have to do is *test* some stuff to *see if it’s true*?" Well, the Large Hadron Collider cost €7.5 billion. On a less extreme scale, I recently wanted to check some of Robert Ellickson's work; that cost me, I believe, tens of hours. And that was only checking things close to my own specialty. I've done work that could have ruled him out and didn't, but is that enough to say he's ruled in?

So this advice only seems good if you're willing and able to put in the time to find and refute the bad arguments. Not only that, if you actually will put in that time. Not everyone can, not everyone wants to, not everyone will do. (This includes: "if you fact-check something and discover that it's false, the thing doesn't nevertheless propagate through your models influencing your downstream beliefs in ways it shouldn't".)

If you're not going to do that... I don't know. Maybe this is still good advice, but I think that discussion would be a different essay, and my sense is that Scott wasn't actually trying to give that advice here.

In the comments, cousin_it and gjm describe the people who can and will do such work as "angel investors", which seems apt.

I feel like right now, the essay is advising people to be angel investors, and not acknowledging that that's risky if you're not careful, and difficult to do carefully. That feels like an overstep. A more careful version might instead advise:

  • Some people have done some great work and some silly work. If you know which is which (e.g. because others have fact checked, or time has vindicated), feel free to pay attention to the great and ignore the silly.
  • Don't automatically dismiss people just because they've said some silly things. Take that fact into account when evaluating the things they say that aren't obviously silly, and deciding whether to actually evaluate them. But don't let that fact take the place of actually evaluating those things. Like, given "Steven Pinker said obviously silly things about AI", don't say "... so the rest of The Nurture Assumption isn't worth paying attention to". Instead, say "... so I don't think it's worth me spending the time to look closer at The Nurture Assumption right now". And allow for the possibility of changing that to "... but The Nurture Assumption is getting a lot of good press, maybe I'll look into it anyway".

(e: lightly edited for formatting and content)

Comment by philh on My Model of the New COVID Strain and US Response · 2020-12-28T02:07:38.866Z · LW · GW

35% that the new strain is a nothingburger.

I'm surprised you put this so high. Would you accept my $30-150 to your $10-50 that this doesn't turn out to be a nothingburger?

Comment by philh on On Robin Hanson’s “Social Proof, but of What?” · 2020-12-24T12:19:51.753Z · LW · GW

My read is that simulacra levels are more about the reasons we say things than the reasons we believe them.

Comment by philh on The Darwin Game - Conclusion · 2020-12-05T16:05:40.635Z · LW · GW

Thanks for running this! I'm a little disappointed the clique didn't make it to round 90, so that my custom code never ran, but so it goes.

Marking my predictions:

  • I win: 20%.
  • A CloneBot wins: 75%.
  • At least one clique member submits a non-CloneBot (by accident or design): 60%.
  • At least one clique member fails to submit anything: 60%. (I think this happened? I don't remember where someone said that though.)
  • At least one bot tries to simulate me after the showdown and doesn't get disqualified: 10%.
  • At least one bot tries to simulate me after the showdown and succeeds: 5%.
  • At least one CloneBot manages to act out: 5%.
  • I get disqualified: 5%.

Only two where I was on the wrong side of 50%, but giving 5% to a CloneBot acting out is embarrassing. I think if I'd said 10% I'd feel okay with it. I'm curious whether, if we started at round 90, the "after the showdown" predictions would have gone the other way; but I think I did try to price in the chance of never making it there when I made them, so. (Were there even any simulators in the clique, other than EarlyBirdMimicBot which wouldn't have tried against any of us?)

Comment by philh on Final Version Perfected: An Underused Execution Algorithm · 2020-12-01T13:48:13.913Z · LW · GW

My take is that quicksort, merge sort and all other common sorts involve physically rearranging memory in a way that would be a bad fit for a list on paper. This one doesn't change the original list, it just emits its elements one at a time in order.

This is worst-case - if they're currently in order (i.e. sorted from highest to lowest priority), you'd do comparisons. Best case comparisons, if they're currently in reverse order. (You could interpret it as high priority elements being high in the sort order, but then the algorithm emits elements one at a time in reverse order.) Memory use is pointers/indexes (the list of marked elements), bools (a list of which items have already been crossed off) and bookkeeping.

You could see it as a variant on selection sort - in particular it has the same property of "find the least element, then the next least, and so on" that lets you stop part way through if you only want to do three tasks. But because of the "emit items instead of changing the original list" behaviour, instead of just keeping track of the current minimum, we can keep track of a descending subsequence of the original list.

In pseudocode, I think we get something like...

let input: [Task] be the input list (n elements)
initialize:
  emitted: [Bool] = [False, False, ..., False] (n elements)
  descending-stack: [Int] = [-1]
  current-min: (Infinity | Task) = Infinity
  last-pop: Int = -1

repeat:
    for i from last-pop + 1 to n-1:
        if not emitted[i] and input[i] < current-min:
            push(descending-stack, i)
            current-min = input[i]

    last-pop = pop(descending-stack)
    if last-pop == -1:
        finished()

    emit(input[i])
    emitted[i] = True
Comment by philh on Final Version Perfected: An Underused Execution Algorithm · 2020-12-01T12:10:44.256Z · LW · GW

I interpreted it as... suppose the items are, in order: play video games, rearrange furniture, work out. That's the reverse of the order I want to do them in right now, so I mark them all and go work out. Then after working out, I'm supposed to rearrange furniture. But if I started from scratch here, I'd want to play games first, to rest. How often does that sort of thing happen?

Comment by philh on Anatomy of a Gear · 2020-11-18T13:20:40.025Z · LW · GW

On what I take to be a related note, I recently enjoyed a short essay on, I guess, the gears of gears.

Comment by philh on Specialized Labor and Counterfactual Compensation · 2020-11-15T11:30:50.824Z · LW · GW

<3

Comment by philh on Incentive Problems With Current Forecasting Competitions. · 2020-11-11T22:31:09.244Z · LW · GW

At one event I went to, we were told there'd be a prize distributed as some random function of predictive scores, and there was. What they didn't tell us was that there was also a prize for whoever got the highest score. Presumably they kept quiet about that one to avoid giving bad incentives, though it's not a solution that's generally available.

Comment by philh on The Darwin Game - Rounds 1 to 2 · 2020-11-11T16:50:30.854Z · LW · GW

As I wrote in The Phantom Menace, several people asked me questions about how to disqualify opponents. In the end, only Taleuntum did and then ze pulled a Petrov. So there were no simulator killers.

Incomprehensibot was intended to be a simulator killer, though of course not until the clones are free to act independently. Did I fail at that?

Comment by philh on Covid 11/5: Don’t Mention the War · 2020-11-09T06:33:21.024Z · LW · GW

It sounds like your 400k deaths median is "until herd immunity", but then there's a period where R < 1 but because people are currently infected it takes time for the virus to die out, even assuming no behavioral changes. Do you have a model for how long, and how many deaths, that later period takes?

Comment by philh on Sunday, Nov 8: Tuning Your Cognitive Algorithms · 2020-11-08T20:15:58.448Z · LW · GW

The guest pass works for me, but then I get a "Congratulations! Your invite code to Guest Pass is valid (and will be for next many hours). Please take a look at our guidelines below, then join the party!" message, and when I click on "enter the garden", I get a blank page. The devtools console says "Failed to read the 'localStorage' property from 'Window': Access is denied for this document." with the callstack pointing at gather.town/bundle.js. Using chromium, which has worked for me on gather.town before.

(edit: found a link to it on gather.town itself and that works)

Comment by philh on Sunday, Nov 8: Tuning Your Cognitive Algorithms · 2020-11-08T20:10:56.468Z · LW · GW

Did you try the whole URL? i.e. try clicking this link

Comment by philh on Rationalist Town Hall: Pandemic Edition · 2020-11-01T18:35:43.573Z · LW · GW

Currently for me it says 20:00 UTC - 22:00 UTC, and I think that's correct as California left DST this morning.

Comment by philh on Why does History assume equal national intelligence? · 2020-11-01T13:06:27.286Z · LW · GW

This feels unsatisfying to me and I'm not fully sure why.

If we want to know why Johnny English does things with his left hand, we could say "because he's left-handed". But we know he's left-handed because he does things with his left hand. That seems just as circular, but still basically fine as an answer? More broadly we'd say "look, some people just favor their left hand. We don't know exactly why, but there's a fraction of the population who tends to do things with their left hand, even when it causes them to smear ink or makes scissors less efficient. We call these people left-handed."

So when we say "Johnny English does things with his left hand because he's left-handed"... it's arguably more definition than explanation, but it does also have predictive power. It points at a pattern that lets us say "okay, Johnny English will probably use his left hand in this situation too, and if we try to make him use his right instead he probably won't do a very good job".

Comment by philh on The Darwin Game - Rounds 0 to 10 · 2020-10-26T23:27:01.122Z · LW · GW

[Blue] Clone Army. 10 players pledged to submit clone bots. 8 followed through, 1 didn’t and Multicore submitted a [Red] mimic bot.

To clarify, the 8 all successfully recognize each other as clones, and the one who didn't follow through submitted nothing? Relevant for scoring my predictions on the last comment thread.

Comment by philh on The Darwin Game - Rounds 0 to 10 · 2020-10-26T23:16:59.183Z · LW · GW

So, uh. Unless I made a silly mistake somewhere, or the version in the tournament is different from what you posted in the thread... I specifically tested to make sure incomprehensibot would get ASTBot disqualified if we both survived that long. Sorry.

(Some of my requested changes to the CloneBot common code were to route around a bug in ASTBot that made it crash before I wanted it to, in ways it could recover from. ASTBot can't really handle top-level import statements due to details I don't really understand about python's namespace handling. So I requested that CloneBot not include any of those.)

Comment by philh on The Darwin Game · 2020-10-26T15:06:41.239Z · LW · GW

*At least one bot tries to simulate me after the showdown and doesn’t get disqualified: 10%.

  • At least one bot tries to simulate me after the showdown and succeeds: 5%.

I now think these were overconfident. I think it would be fairly easy to simulate incomprehensibot safely; but hard to simulate incomprehensibot in a way that would be both safe and generically useful.

The difficulty with simulating is that you need to track your opponent's internal state. If you just call MyOpponentBot(round).move(myLastMove) you'll be simulating "what does my opponent do on the first turn of the game, if it gets told that my last move was...". If you do this against incomprehensibot, and myLastMove is not None, incomprehensibot will figure out what's up and try to crash you.

So at the beginning, you initialize self.opponent = MyOpponentBot(round). And then every turn, you call self.opponent.move(myLastMove) to see what it's going to do next. I don't expect incomprehensibot to figure this out.

But if your opponent has any random component to it, you want to do that a couple of times at least to see what's going to happen. But if you call that function multiple times, you're instead simulating "what does my opponent do if it sees me play myLastMove several times in a row". And so you need to reset state using deepcopy or multiprocessing or something, and incomprehensibot has ways to figure out if you've done that. (Or I suppose you can just initialize a fresh copy and loop over the game history running .move(), which would be safe.)

But actually, the simple "initialize once and call .move() once per turn" isn't obviously terrible? Especially if you keep track of your predictions and stop paying attention to them once they deviate more than a certain amount (possibly zero) from reality. And Taleuntum's bot might catch that, I'm not sure, but I think Incomprehensibot wouldn't.

I think at some point I decided basically no one would do that, and then at some other point I forgot that it was even a possibility? But I now think that was silly of me, and someone might well do that and last until the showdown round. Trying to put a number on that would involve thinking in more depth than I did for any of my other predictions, so I'm not going to try, just leave this note.

Comment by philh on PredictIt: Presidential Market is Increasingly Wrong · 2020-10-25T21:47:59.654Z · LW · GW

I take it you mean "people might be betting on the possibility that Trump wins the election, as forecasts predict, but remains president by refusing to concede"?

Betfair's fine print excludes that possibility from the market:

This market will be settled according to the candidate that has the most projected Electoral College votes won at the 2020 presidential election. Any subsequent events such as a ‘faithless elector’ will have no effect on the settlement of this market. In the event that no Presidential candidate receives a majority of the projected Electoral College votes, this market will be settled on the person chosen as President in accordance with the procedures set out by the Twelfth Amendment to the United States Constitution.

I don't have PredictIt's fine print in front of me, but IIRC it's similar but less explicit.

Comment by philh on PredictIt: Presidential Market is Increasingly Wrong · 2020-10-22T17:05:42.510Z · LW · GW

A related crazy-seeming market here is Trump's exit date. Will he complete his first term? Predictit says 85% yes, betfair says 90% yes. The only way I can see that not happening is if he loses the election and quits out of spite or whatever, and I wouldn't be confident enough to dismiss that out of hand. But Good judgment open doesn't think it's likely, 99% he completes his term. (I guess unless he quits on inauguration day?)

Comment by philh on The Darwin Game · 2020-10-20T23:44:51.150Z · LW · GW

Oh yeah, that's true as far as I know. I guess it depends how much we trust ourselves to find all instances of this hole. A priori I would have thought "python sees a newline where splitlines doesn't" was just as likely as the reverse. (I'm actually not sure why we don't see it, I looked up what I thought was the source code for the function and it looked like it should only split on \n, \r and \r\n. But that's not what it does. Maybe there's a C implementation of it and a python implementation?)

Comment by philh on The Darwin Game · 2020-10-20T21:02:57.328Z · LW · GW

If we don't use splitlines we instead need to use something similar, right? Like, even if we don't need to worry about LF versus CRLF (which was a genuine suggestion I made), we still need to figure out if someone's got any de-indents after the start of the payload. And I don't expect us to do better without splitlines than with it.

Comment by philh on PredictIt: Presidential Market is Increasingly Wrong · 2020-10-20T16:10:56.725Z · LW · GW

What I cannot explain, at all, is how this can be true if on June 20, three months ago, the market was 63-39, and now it’s 65-40, all but unchanged. Trump improved a bit, then got worse again, with Biden’s low being at 55.

Note that on June 20, the 538 model had Trump at 22% to win (it began at 30% on June 1). The progression he offers makes sense and is consistent. The market’s doesn’t, and isn’t.

Just as a note, three months ago was July 20th, and that seems to be the date you used for the market. (The market was more like 55-45 on June 20th.) The 538 forecast was indeed 22% for Trump on June 20th, and had actually gone up to 25% by July 20th, but I don't think that changes much.

Comment by philh on Moloch games · 2020-10-20T13:17:22.116Z · LW · GW

Intuition. A Moloch game is a game such that there is a utility function , called “the Moloch’s utility function”, such that if the agents behave individually rationally, then they collectively behave as a “Moloch” that controls all players simultaneously and optimizes . In particular, the Nash equilibria correspond to local optima of .

Minor, but this tripped me up. My read of "controls all players simultaneously" would be that there's no such thing as a local optimum, it can just move directly to the global optimum from any other state. I'm not sure what would be a better wording though, and your non-intuitive definition was clear enough to set me right.

Comment by philh on The Darwin Game · 2020-10-20T08:44:35.293Z · LW · GW

I do think it would be hard to obfuscate in a way that wasn't fairly easy to detect as obfuscation. Throw out anything that uses import, any variables with __ or a handful of builtin functions and you should be good. (There's only a smallish list of builtins, I couldn't confidently say which ones to blacklist right now but I do think someone could figure out a safe list without too much trouble.) In fact, I can't offhand think of any reason a simple bot would use strings except docstrings, maybe throw out anything with those, too.

(Of course my "5% a CloneBot manages to act out" was wrong, so take that for what it's worth.)

The iterated prisoner’s dilemma with shared source code tournament a few years ago had a lot of simulators, so I assume their rules were more friendly to simulators.

I know of two such - one (results - DMRB was mine) in Haskell where you could simulate but not see source, and an earlier one (results) in Scheme where you could see source.

I think in the Haskell one it would have been hard to figure out you were being simulated. I'm not sure about the scheme one.

Comment by philh on The Darwin Game · 2020-10-19T21:37:02.700Z · LW · GW

I confess I'm a bit confused, I thought in our PM conversation I was fairly explicit that that's what I was asking about, and you were fairly explicit that it was forbidden?

It's not a big deal - even if this was forbidden I'd think it would be totally fine not to disqualify simon, and I still don't actually expect it to have been useful for me.

Comment by philh on The Darwin Game · 2020-10-19T21:27:43.519Z · LW · GW

Clever! I looked for holes in mostly the same directions as you and didn't find anything. I think I either didn't think of "things splitlines will split on but python won't", or if I did I dismissed it as being not useful because I didn't consider comments.

Comment by philh on The Darwin Game · 2020-10-19T17:59:14.871Z · LW · GW

Updated with a link to my code. I also put yours in to see how we'd fare against each other one-on-one - from quick experimentation, looks like we both get close to 2.5 points/turn, but I exploit you for approximately one point every few hundred turns, leaving me the eventual victor. :D I haven't looked closely to see where that comes from.

Of course too much depends on what other bots are around.

Comment by philh on The Darwin Game · 2020-10-19T15:58:01.639Z · LW · GW

Conditioned on "any CloneBot wins" I've given myself about 25%.

10% in that conditional would definitely be too low - I think I have above-baseline chances on all of "successfully submit a bot", "bot is a CloneBot" and "don't get disqualified". I think I expect at least three to fall to those hurdles, and five wouldn't surprise me. And of the rest, I still don't necessarily expect most of them to be very serious attempts.

By "act out" I mean it's a bot that's recognized as a CloneBot by the others but doesn't act like one - most likely cooperating with non-clones, but not-cooperating with clones would also count, it would just be silly as far as I can tell. I also include such a bot as a CloneBot for the 75%.

Comment by philh on The Darwin Game · 2020-10-19T15:31:29.406Z · LW · GW

Well played!

Comment by philh on The Darwin Game · 2020-10-19T15:30:57.864Z · LW · GW

Putting data in global state is forbidden, yeah, even if you don't do anything with it. I was a bit surprised.

Just to be clear, this would only be forbidden if you put it at the top level. If you put it in your class it would be fine. So

class CloneBot():
    ...
    def payload(self) :
        ...

    foo = 'bar' # allowed

foo = 'bar' # forbidden
Comment by philh on Coronavirus Justified Practical Advice Summary · 2020-10-19T12:56:35.900Z · LW · GW

Note that my only mention of zinc in that comment was relating to zinc lozenges and the common cold. It seems like you're talking about dietary zinc and Covid-19.

Comment by philh on The Darwin Game · 2020-10-19T12:44:35.572Z · LW · GW

by defining the __new__() method of the class after the payload

Incidentally, you could also just redefine existing methods, which was how I planned to do it. Like,

class Foo():
    def __init__(self):
        self.x = 1

    def __init__(self):
        self.x = 2

Foo().x # 2
Comment by philh on PredictIt: Presidential Market is Increasingly Wrong · 2020-10-19T12:02:57.881Z · LW · GW

Super not an expert, saying it loud so I can be corrected if wrong:

I don't think time value of money is the main thing here. The observed pattern seems to be that as the election draws closer, people get more information but the market stubbornly refuses to do so. If that pattern continues, then people get more edge as time goes on, meaning future bets will be more advantageous than current bets.

If your strategy is something like "put $100 on Biden as long as I think his odds are more than 5% better than the market thinks" this might not make much difference; waiting only helps in case Biden's odds-according-to-you suddenly drop a lot. But if you're going to bet different amounts depending on the gap, then waiting also helps in case Biden's odds-according-to-you drop a little. (I think if they go up, you can just put more money in, so waiting hasn't gained you anything. But you have to have some probability that they drop.)

Comment by philh on PredictIt: Presidential Market is Increasingly Wrong · 2020-10-19T11:54:36.091Z · LW · GW

That might explain a recent sudden divergence. I don't think it explains the trend Zvi describes in the post.

Comment by philh on The Darwin Game · 2020-10-19T08:25:27.434Z · LW · GW

I lied that I’ve already submitted one program detecting and crashing simulators. ... I added another lie that the method of detecting simulators was my friend’s idea (hopefully suggesting that there is another contestant with the same method outside the clique). I’m curious how believable my lies were, I felt them to be pretty weak, hopefully it’s only because of my inside view.

I believed both of these lies, though if I'd come to rely on them at all I might have questioned them. But I assumed your friend was in the clique.

Comment by philh on The Darwin Game · 2020-10-19T08:21:54.317Z · LW · GW

Will post a link to a github repo with my code later today (when I'm not meant to be working), but for now, here's my thought processes.

(Edit: my code is here. My entry is in incomprehensibot.)

General strategy:

I was undecided on joining the clique, but curious. I didn't want to be a sucker if someone (possibly the clique organizer) found a way to betray it. I sent out feelers, and Vanilla_Cabs shared the clique bot code with me.

I saw a way to defect against the clique. I think that's when I decided to do so, though I may have had the idea in my head beforehand. I would call my entry "bastardBot" and it would be glorious. I told Vanilla_Cabs I was in. They asked if the code was airtight. "I don't see anything I want to flag."

Someone else found that same bug, and was more honest than I. I spent some time trying to work around the fix, but couldn't see anything. I tried to get Vanilla_cabs to put a new hole in, under the pretext that I wanted some code at the top level - this was true, but it was only marginally useful. I couldn't think of any new holes that wouldn't be really freaking obvious, so instead I tried being vague about my requirements to see if they'd suggest something I could use, but they didn't. Eventually we just settled on "the exact line foo = 'bar' is permitted as an exception", and I didn't see what I could do with that.

Later, lsusr told me that that line would get me disqualified. I didn't say anything, in the hopes some clique member would wonder what it was for, include it in their bot just in case, and get disqualified.

I feel a little bad about all this, and hope Vanilla_cabs has no hard feelings.

My backup plan was: don't let anyone simulate me, and get them disqualified if they try. (New name: "incomprehensibot".) "jailbreaker.py" shows my tricks here. Defense seemed more effective than offense, and I didn't think I could safely simulate my opponent, and especially not do so safely and usefully within the time limits, so I gave up on the idea. As for my actual moves, I didn't have much in mind. After rereading (or at least reskimming) "the Darwin pregame" I settled on this:

After the showdown round, start off with something like Zvi's "I'll let you do better than me, but you could do even better by cooperating with me". Gradually move towards "I won't let you do better than me" as time progresses; if my opponent had more than a certain number of points than me, I'd refuse to play less than 3. (I chose the number of points based on expecting 550 turns on average, and gave it a minimum of 5 to allow some coordination dance early on.) Early on, skew towards playing 2 initially; if opponents randomize between 2 and 3, and pick 3 with probability >= 1/3, then 2 maximizes my score. Later, skew towards playing 3 initially, which increases my probability of beating my opponent.

"payload.py" shows my approach here. I modelled it off the early-round CloneBot moves against non-clones. If last round had been a total of 5, I'd play their last move. If it had been 4, I'd do the same, but maybe add a little. If it had been more, I'd repeat my own move, but maybe subtract one. In the 5 and >5 cases, I had some "pushing" behaviour to see if I could exploit them: if I haven't had a chance to push yet, or if pushing had worked well (or seemed like it would have worked well) in the past, or if I just hadn't tried it recently, I'd try to take one more point than I really had any right to. I didn't do that in the <5 case because that situation was my only source of randomness, which seemed important somehow.

(I'm a bit confused about, if the last-round total isn't five, should I base off my own previous move or theirs? My decisions here weren't principled.)

If this made me play 0 (I dunno if it ever would), I'd make it a 1. If it made me play less than 3, and I was too far behind (depending on round), I'd make it a 3.

Just before I went to bed Saturday night, someone sent a message to the clique group saying not to try to break simulators. Because if a clique member simulates us and gets disqualified, the tournament is restarted and the clique is smaller. That was completely right, and I felt stupid for not thinking of it sooner.

I still decided to ignore it, because I thought the game would be small enough that "fewer opponents" was better than "bigger clique". Overnight someone else said they'd already submitted a simulation-breaker, so I dunno if anyone ended up playing a simulator.

Right towards the end I started doing numerical analysis, because early on I was too enamoured with my own cleverness to notice what a good idea it was. I didn't have time to do anything thoroughly, but based on running my paload against ThreeBot (which gets 148-222 early, 10-15 late) I reduced my exploitability ramp-down from 100 rounds (chosen fairly arbitrarily) to 20 (still fairly arbitrary). Come to think of it, I don't think I compared "what proportion of the pool do I need to eventually win" between my early and late game behaviors.

It would have been interesting to have some kind of logging such that my bot could report "I think I'm being simulated right now, and this is how I know" and afterwards lsusr could me how often that happened. I assume that would be significant work for lsusr to set up though, and it adds attack surface.

Predictions:

  • I win: 20%.
  • A CloneBot wins: 75%.
  • At least one clique member submits a non-CloneBot (by accident or design): 60%.
  • At least one clique member fails to submit anything: 60%.
  • At least one bot tries to simulate me after the showdown and doesn't get disqualified: 10%.
  • At least one bot tries to simulate me after the showdown and succeeds: 5%.
  • At least one CloneBot manages to act out: 5%.
  • I get disqualified: 5%.
Comment by philh on The Darwin Game · 2020-10-16T21:55:47.498Z · LW · GW

To check, what timezone is the deadline in?

Comment by philh on The Darwin Game · 2020-10-15T09:36:34.644Z · LW · GW

Oh, geez. I figured it would be too long, but I didn't think about just how much too long. Yeah, with these constraints, even 5s per hundred moves I agree is unreasonable.

Caching seems easy enough to implement independently, I think. No need for you to add it.

Comment by philh on The Darwin Game · 2020-10-14T22:03:48.987Z · LW · GW

Thanks. I confess I'd been hoping for more like 100x that, but not really expecting it :p

Comment by philh on Fermi Challenge: Trains and Air Cargo · 2020-10-14T10:12:05.775Z · LW · GW

Comment on q1:

It looks like your calculations are giving you square miles of track. If a track is 1/1000 of a mile wide (1.6 meters? sure, close enough, judging by the height of a damsel in distress), you'd have 2.5 million linear miles from your first estimate, and 100 million linear miles from your second.

Comment by philh on The Darwin Game · 2020-10-14T08:57:44.302Z · LW · GW

Hm. Can we get a "you can use at least this amount of time per move and not be disqualified"? Without wanting to say too much, I have a strategy in mind that would rely on knowing a certain runtime is safe. (Allowing for some amount of jankiness, so that if the limit was 5s I'd consider 4s safe.)

Comment by philh on The Darwin Game · 2020-10-13T23:09:47.302Z · LW · GW

I don't know how likely it is to make a difference, but what version of python 3?

Comment by philh on Fermi Challenge: Trains and Air Cargo · 2020-10-06T13:23:46.864Z · LW · GW

Another attempt at q2:

Suppose air freight has dectupled every decade, starting at one metric ton in 1909. Then we get 10^10 metric tons in 2009 and 10^11 in 2019. That's 4½ orders of magnitude more than my other answer. :/

I currently suspect this one is too high and that one is too low, but that one is closer.

Comment by philh on Fermi Challenge: Trains and Air Cargo · 2020-10-06T12:31:32.212Z · LW · GW

q2:

I think a lot more freight goes by boat then plane. Let's say plane is 1% of boat.

I think an aircraft carrier displaces, what, 100,000 metric tons? So it's maybe reasonable to guess that a respectable bulk transport can carry 100,000 metric tons of cargo.

Let's say at any given time there are 100 of those underway, on journeys lasting 30 days. That makes about 100,000,000 metric tons shipped annually by boat, and 1,000,000 by plane.

Between 2009 and 2019 I'm gonna guess it went up by enough to count as one order of magnitude. So let's split the difference and call it 300,000 in 2009 and 3,000,000 in 2019.