Posts

Has anybody used quantification over utility functions to define "how good a model is"? 2021-02-02T15:14:54.386Z
Where numbers come from 2021-01-28T21:33:57.209Z
Demystifying the Second Law of Thermodynamics 2020-11-22T13:52:17.280Z
How much is known about the "inference rules" of logical induction? 2020-08-08T10:45:09.029Z
Bets and updating 2019-10-07T23:06:18.778Z
Joy in Discovery: Galois theory 2019-09-02T19:16:46.542Z
Eigil Rischel's Shortform 2019-08-30T20:37:38.828Z

Comments

Comment by Eigil Rischel (eigil-rischel) on Clarifying the Agent-Like Structure Problem · 2022-10-06T12:36:15.246Z · LW · GW

I mean, "is a large part of the state space" is basically what "high entropy" means!

For case 3, I think the right way to rule out this counterexample is the probabilistic criterion discussed by John - the vast majority of initial states for your computer don't include a zero-day exploit and a script to automatically deploy it. The only way to make this likely is to include you programming your computer in the picture, and of course you do have a world model (without which you could not have programmed your computer)

Comment by Eigil Rischel (eigil-rischel) on OpenAI Solves (Some) Formal Math Olympiad Problems · 2022-02-03T21:35:12.716Z · LW · GW

I think we're basically in agreement here (and I think your summary of the results is fair, I was mostly pushing back on a too-hyped tone coming from OpenAI, not from you)

Comment by Eigil Rischel (eigil-rischel) on OpenAI Solves (Some) Formal Math Olympiad Problems · 2022-02-03T18:05:59.039Z · LW · GW

Most of the heavy lifting in these proofs seem to be done by the Lean tactics. The comment "arguments to nlinarith are fully invented by our model" above a proof which is literally the single line nlinarith [sq_nonneg (b - a), sq_nonneg (c - b), sq_nonneg (c - a)] makes me feel like they're trying too hard to convince me this is impressive.

The other proof involving multiple steps is more impressive, but this still feels like a testament to the power of "traditional" search methods for proving algebraic inequalities, rather than an impressive AI milestone. People on twitter have claimed that some of the other problems are also one-liners using existing proof assistant strategies - I find this claim totally plausible.

I would be much more impressed with an AI-generated proof of a combinatorics or number theory IMO problem (eg problem 1 or 5 from 2021). Someone with more experience in proof assistants probably has a better intuition for which problems are hard to solve with "dumb" searching like nlinarith, but this is my guess.

Comment by Eigil Rischel (eigil-rischel) on Eigil Rischel's Shortform · 2022-01-21T10:05:21.108Z · LW · GW

Yes, that's right. It's the same basic issue that leads to the Anvil Problem

Comment by Eigil Rischel (eigil-rischel) on Eigil Rischel's Shortform · 2022-01-20T21:41:14.116Z · LW · GW

Compare two views of "the universal prior"

  • AIXI: The external world is a Turing machine that receives our actions as input and produces our sensory impressions as output. Our prior belief about this Turing machine should be that it's simple, i.e. the Solomonoff prior
  • "The embedded prior": The "entire" world is a sort of Turing machine, which we happen to be one component of in some sense. Our prior for this Turing machine should be that it's simple (again, the Solomonoff prior), but we have to condition on the observation that it's complicated enough to contain observers ("Descartes' update"). (This is essentially Naturalized induction

I think of the difference between these as "solipsism" - AIXI gives its own existence a distinguished role in reality.

Importantly, the laws of physics seem fairly complicated in an absolute sense - clearly they require tens[1] or hundreds of bits to specify. This is evidence against solipsism, because on the solipsistic prior, we expect to interact with a largely empty universe. But they don't seem much more complicated than necessary for a universe that contains at least one observer, since the minimal source code for an observer is probably also fairly long.

More evidence against solipsism:

  • The laws of physics don't seem to privilege my frame of reference. This is a pretty astounding coincidence on the solipsistic viewpoint - it means we randomly picked a universe which simulates some observer-independent laws of physics, then picks out a specific point inside it, depending on some fairly complex parameters, to show me.
  • When I look out into the universe external to my mind, one of the things I find there is my brain, which really seems to contain a copy of my mind. This is another pretty startling coincidence on the solipsistic prior, that the external universe being run happens to contain this kind of representation of the Cartesian observe

  1. This is obviously a very small number but I'm trying to be maximally conservative here. ↩︎

Comment by Eigil Rischel (eigil-rischel) on davidad's Shortform · 2021-12-11T21:39:51.384Z · LW · GW

Right, that's essentially what I mean. You're of course right that this doesn't let you get around the existence of nonmeasurepreserving automorphisms. I guess what I'm saying is that, if you're trying to find a prior on , you should try to think about what system of finite measurements this is idealizing, and see if you can apply a symmetry argument to those bits. Which isn't always the case! You can only apply the principle of indifference if you're actually indifferent. But if it's the case that is generated in a way where "there's no reason to suspect that any bit should be rather than , or that there should be correlations between the bits", then it's of course not the case that has this same property. But of course you still need to look at the actual situation to see what symmetry exists or doesn't exist.

Haar measures are a high-powered way of doing this, I was just thinking about taking the iterated product measure of the uniform probability measure on (justified by symmetry considerations). You can of course find maps from to all sorts of spaces but it seems harder to transport symmetry considerations along these maps.

Comment by Eigil Rischel (eigil-rischel) on The Promise and Peril of Finite Sets · 2021-12-11T18:04:11.009Z · LW · GW

I strongly recommend Escardó's Seemingly impossible functional programs, which constructs a function search : ((Nat -> Bool) -> Bool) -> (Nat -> Bool) which, given a predicate on infinite bitstrings, finds an infinite bitstring satisfying the predicate if one exists. (In other words, if p : (Nat -> Bool) -> Bool and any bitstring at all satisfies p, then p (search p) == True

(Here I'm intending Nat to be the type of natural numbers, of course) .

Comment by Eigil Rischel (eigil-rischel) on davidad's Shortform · 2021-12-11T17:54:38.215Z · LW · GW

Ha, I was just about to write this post. To add something, I think you can justify the uniform measure on bounded intervals of reals (for illustration purposes, say ) by the following argument: "Measuring a real number " is obviously simply impossible if interpreted literally, containing an infinite amount of data. Instead this is supposed to be some sort of idealization of a situation where you can observe "as many bits as you want" of the binary expansion of the number (choosing another base gives the same measure). If you now apply the principle of indifference to each measured bit, you're left with Lebesgue measure.

It's not clear that there's a "right" way to apply this type of thinking to produce "the correct" prior on (or or any other non-compact space.

Comment by Eigil Rischel (eigil-rischel) on I’m no longer sure that I buy dutch book arguments and this makes me skeptical of the "utility function" abstraction · 2021-06-22T08:57:19.674Z · LW · GW

This seems somewhat connected to this previous argument. Basically, coherent agents can be modeled as utility-optimizers, yes, but what this really proves is that almost any behavior fits into the model "utility-optimizer", not that coherent agents must necessarily look like our intuitive picture of a utility-optimizer.

Paraphrasing Rohin's arguments somewhat, the arguments for universal convergence say something like "for "most" "natural" utility functions, optimizing that function will mean acquiring power, killing off adversaries, acquiring resources, etc". We know that all coherent behavior comes from a utility function, but it doesn't follow that most coherent behavior exhibits this sort of power-seeking.

Comment by Eigil Rischel (eigil-rischel) on We need a standard set of community advice for how to financially prepare for AGI · 2021-06-07T10:25:29.550Z · LW · GW

My impression from skimming a few AI ETFs is that they are more or less just generic technology ETFs with different branding and a few random stocks thrown in. So they're not catastrophically worse than the baseline "Google, Microsoft and Facebook" strategy you outlined, but I don't think they're better in any real way either.

Comment by Eigil Rischel (eigil-rischel) on Finite Factored Sets · 2021-06-04T10:07:04.390Z · LW · GW

This is really cool!

The example of inferring from the independence of and reminds me of some techniques discussed in Elements of Causal Inference. They discuss a few different techniques for 2-variable causal inference.

One of them, which seems to be essentially analogous to this example, is that if are real-valued variables, then if the regression error (i.e for some constant ) is independent of , it's highly likely that is downstream of . It sounds like factored sets (or some extension to capture continuous-valued variables) might be the right general framework to accommodate this class of examples.

Comment by Eigil Rischel (eigil-rischel) on Finite Factored Sets · 2021-06-03T14:58:39.773Z · LW · GW

Thanks (to both of you), this was confusing for me as well.

Comment by Eigil Rischel (eigil-rischel) on The Fall of Rome: Why It's Relevant, And Why We're Mistaken · 2021-04-23T13:03:57.332Z · LW · GW

At least one explanation for the fact that the Fall of Rome is the only period of decline on the graph could be this: data becomes more scarce the further back in history you go. This has the effect of smoothing the historical graph as you extrapolate between the few datapoints you have. Thus the overall positive trend can more easily mask any short-term period of decay.

Comment by Eigil Rischel (eigil-rischel) on What weird beliefs do you have? · 2021-04-14T13:51:22.000Z · LW · GW

Lsusr ran a survey here a little while ago, asking people for things that "almost nobody agrees with you on". There's a summary here

Comment by Eigil Rischel (eigil-rischel) on Unconvenient consequences of the logic behind the second law of thermodynamics · 2021-03-12T12:26:48.755Z · LW · GW

This argument proves that

  • Along a given time-path, the average change in entropy is zero
  • Over the whole space of configurations of the universe, the average difference in entropy between a given state and the next state (according to the laws of physics) is zero. (Really this should be formulated in terms of derivatives, not differences, but you get the point).

This is definitely true, and this is an inescapable feature of any (compact) dynamical system. However, somewhat paradoxically, it's consistent with the statement that, conditional on any given (nonmaximal) level of entropy, the vast majority of states have increasing entropy.

In your time-diagrams, this might look something like this:

image

I.e when you occasionally swing down into a somewhat low-entropy state, it's much more likely that you'll go back to high-entropy than that you'll go further down. So once you observe that you're not in the maxentropy state, it's more likely that you'll increase than that you'll decrease.

(It's impossible for half of the mid-entropy states to continue to low-entropy states, because there are much more than twice as many mid-entropy states as low-entropy states, and the dynamics are measure-preserving).

Comment by Eigil Rischel (eigil-rischel) on Kelly *is* (just) about logarithmic utility · 2021-03-04T15:24:28.549Z · LW · GW

This argument doesn't work because limits don't commute with integrals (including expected values). (Since practical situations are finite, this just tells you that the limiting situation is not a good model).

To the extent that the experiment with infinite bets makes sense, it definitely has EV 0. We can equip the space with a probability measure corresponding to independent coinflips, then describe the payout using naive EV maximization as a function - it is on the point and everywhere else. The expected value/integral of this function is zero.

EDIT: To make the "limit" thing clear, we can describe the payout after bets using naive EV maximization as a function , which is if the first values are , and otherwise. Then , and (pointwise), but .

The corresponding functions corresponding to the EV using a Kelly strategy have for all , but

Comment by Eigil Rischel (eigil-rischel) on Kelly *is* (just) about logarithmic utility · 2021-03-03T15:16:42.941Z · LW · GW

The source of disagreement seems to be about how to compute the EV "in the limit of infinite bets". I.e given bets with a chance of winning each, where you triple your stake with each bet, the naive EV maximization strategy gives you a total expect value of , which is also the maximum achievable overall EV. Does this entail that the EV at infinite bets is ? No, because with probability one, you'll lose one of the bets and end up with zero money.

I don't find this argument for Kelly super convincing.

  • You can't actually bet an infinite number of times, and any finite bound on the number of bets, even if it's , immediately collapses back to the above situation where naive EV-maximization also maximizes the overall expected value. So this argument doesn't actually support using Kelly over naive EV maximization in real life.

  • There are tons of strategies other than Kelly which achieve the goal of infinite EV in the limit. Looking at EV in the limit doesn't give you a way of choosing between these. You can compare them over finite horizons and notice that Kelly gives you better EV than others here (maximal geometric growth rate).... but then we're back to the fact that over finite time horizons, naive EV does even better than any of those.

Comment by Eigil Rischel (eigil-rischel) on The slopes to common sense · 2021-03-01T18:36:32.126Z · LW · GW

I don't wanna clutter the comments too much, so I'll add this here: I assume there was supposed to be links to the various community discussions of Why We Sleep (hackernews, r/ssc, etc), but these are just plain text for me.

Comment by Eigil Rischel (eigil-rischel) on Has anybody used quantification over utility functions to define "how good a model is"? · 2021-02-11T13:59:20.621Z · LW · GW

(John made a post, I'll just post this here so others can find it: https://www.lesswrong.com/posts/Dx9LoqsEh3gHNJMDk/fixing-the-good-regulator-theorem)

Comment by Eigil Rischel (eigil-rischel) on Making Vaccine · 2021-02-04T10:24:35.959Z · LW · GW

This seems prima facie unlikely. If you're not worried about the risk of side effects from the "real" vaccine, why not just take it, too (since the efficacy of the homemade vaccine is far from certain)?. On the other hand, if you're the sort of person who worries about the side effects of a vaccine that's been through clinical trials, you're probably not the type to brew something up in your kitchen based on a recipe that you got off the internet and snort it.

Comment by Eigil Rischel (eigil-rischel) on Recognizing Numbers · 2021-01-23T14:32:01.750Z · LW · GW

This is great!

An idea which has picked up some traction in some circles of pure mathematicians is that numbers should be viewed as the "shadow" of finite sets, which is a more fundamental notion.

You start with the notion of finite set, and functions between them. Then you "forget" the difference between two finite sets if you can match the elements up to each other (i.e if there exists a bijection). This seems to be vaguely related to your thing about being invariant under permutation - if a property of a subset of positions (i.e those positions that are sent to 1), is invariant under bijections (i.e permutations) of the set of positions, it can only depend on the size/number of the subset.

See e.g the first ~2 minutes of this lecture by Lars Hesselholt (after that it gets very technical)

Comment by Eigil Rischel (eigil-rischel) on Great minds might not think alike · 2021-01-02T17:24:45.544Z · LW · GW

My mom is a translator (mostly for novels), and as far as I know she exclusively translates into Danish (her native language). I think this is standard in the industry - it's extremely hard to translate text in a way that feels natural in the target language, much harder than it is to tease out subtleties of meaning from the source language.

Comment by Eigil Rischel (eigil-rischel) on Being the (Pareto) Best in the World · 2020-12-18T16:50:19.226Z · LW · GW

This post introduces a potentially very useful model, both for selecting problems to work on and for prioritizing personal development. This model could be called "The Pareto Frontier of Capability". Simply put:

  1. By an efficient markets-type argument, you shouldn't expect to have any particularly good ways of achieving money/status/whatever - if there was an unusually good way of doing that, somebody else would already be exploiting it.
  2. The exception to this is that if only a small amount of people can exploit an opportunity, you may have a shot. So you should try to acquire skills that only a small number of people have.
  3. Since there are a lot of people in the world, it's incredibly hard to become among the best in the world at any particular skill.
  4. This means you should position yourself on the Pareto Frontier - you should seek out a combination of skills where nobody else is better than you at everything. Then you will have the advantage in problems where all these skills matter.

It might be important to contrast this with the economical term comparative advantage, which is often used informally in a similar context. But its meaning is different. If we are both excellent programmers, but you are also a great writer, while I suck at writing, I have a comparative advantage in programming. If we're working on a project together where both writing and programming are relevant, it's best if I do as much programming as possible while you handle as much as the writing as possible - even though you're as good at me as programming, if someone has to take off time from programming to write, it should be you. This collaboration can make you more effective even though you're better at everything than me (in the economics literature this is usually conceptualized in terms of nations trading with each other).

This is distinct from the Pareto optimality idea explored in this post. Pareto optimality matters when it's important that the same person does both the writing and the programming. Maybe we're writing a book to teach programming. Then even if I am actually better than you at programming, and Bob is much better than you at writing (but sucks at programming), you would probably be the best person for the job.

I think the Pareto frontier model is extremely useful, and I have used it to inform my own research strategy.

While rereading this post recently, I was reminded of a passage from Michael Nielsen's Principles of Effective Research:

Say some new field opens up that combines field X and field Y. Researchers from each of these fields flock to the new field. My experience is that virtually none of the researchers in either field will systematically learn the other field in any sort of depth. The few who do put in this effort often achieve spectacular results.

Comment by Eigil Rischel (eigil-rischel) on Demystifying the Second Law of Thermodynamics · 2020-11-23T09:45:16.381Z · LW · GW

I hadn't, thanks!

I took the argument about the large-scale "stability" of matter from Jaynes (although I had to think a bit before I felt I understood it, so it's also possible that I misunderstood it).

I think I basically agree with Eliezer here?

The Second Law of Thermodynamics is actually probabilistic in nature - if you ask about the probability of hot water spontaneously entering the "cold water and electricity" state, the probability does exist, it's just very small. This doesn't mean Liouville's Theorem is violated with small probability; a theorem's a theorem, after all. It means that if you're in a great big phase space volume at the start, but you don't know where, you may assess a tiny little probability of ending up in some particular phase space volume. So far as you know, with infinitesimal probability, this particular glass of hot water may be the kind that spontaneously transforms itself to electrical current and ice cubes. (Neglecting, as usual, quantum effects.)

So the Second Law really is inherently Bayesian. When it comes to any real thermodynamic system, it's a strictly lawful statement of your beliefs about the system, but only a probabilistic statement about the system itself.

The reason we can be sure that this probability is "infinitesimal" is that macrobehavior is deterministic. We can easily imagine toy systems where entropy shrinks with non-neglible probability (but, of course, still grows /in expectation/). Indeed, if the phase volume of the system is bounded, it will return arbitrarily close to its initial position given enough time, undoing the growth in entropy - the fact that these timescales are much longer than any we care about is an empirical property of the system, not a general consequence of the laws of physics.

To put it another way: if you put an ice cube in a glass of hot water, thermally insulated, it will melt - but after a very long time, the ice cube will coalesce out of the water again. It's a general theorem that this must be less likely than the opposite - ice cubes melt more frequently than water "demelts" into hot water and ice, because ice cubes in hot water occupies less phase volume. But the ratio between these two can't be established by this sort of general argument. To establish that water "demelting" is so rare that it may as well be impossible, you have to either look at the specific properties of the water system (high number of particles the difference in phase volume is huge), or make the sort of general argument I tried to sketch in the post.

Comment by Eigil Rischel (eigil-rischel) on Demystifying the Second Law of Thermodynamics · 2020-11-22T16:22:24.380Z · LW · GW

This may be poorly explained. The point here is that

  • is supposed to be always well-defined. So each state has a definite next state (since X is finite, this means it will eventually cycle around).
  • Since is well-defined and bijective, each is for exactly one .
  • We're summing over every , so each also appears on the list of s (by the previous point), and each also appears on the list of s (since it's in )

E.g. suppose and when , and . Then is . But - these are the same number.

Comment by Eigil Rischel (eigil-rischel) on Biextensional Equivalence · 2020-10-29T22:32:07.318Z · LW · GW

But then shouldn't there be a natural biextensional equivalence ? Suppose , and denote . Then the map is clear enough, it's simply the quotient map. But there's not a unique map - any section of the quotient map will do, and it doesn't seem we can make this choice naturally.

I think maybe the subcategory of just "agent-extensional" frames is reflective, and then the subcategory of "environment-extensional" frames is coreflective. And there's a canonical (i.e natural) zig-zag

Comment by Eigil Rischel (eigil-rischel) on Biextensional Equivalence · 2020-10-29T13:24:17.396Z · LW · GW

Does the biextensional collapse satisfy a universal property? There doesn't seem to be an obvious map either or (in each case one of the arrows is going the wrong way), but maybe there's some other way to make it universal?

Comment by Eigil Rischel (eigil-rischel) on Efficient Market Frontier · 2020-08-24T11:01:55.739Z · LW · GW

Right, that's a good point.

Comment by Eigil Rischel (eigil-rischel) on Efficient Market Frontier · 2020-08-23T18:29:37.519Z · LW · GW

What do you think about "cognitive biases as an edge"?

One story we can tell about the markets and coronavirus is this: It was not hard to come to the conclusion, by mid-to-late February, that a global COVID-19 pandemic was extremely likely, and that it was highly probable it would cause a massive catastrophe in the US. A few people managed to take this evidence seriously enough to trade on it, and made a killing, but the vast majority of the market simply didn't predict this fairly predictable course of events. Why not? Because it didn't feel like the sort of thing that could happen. It was too outlandish, too weird.

If we take this story seriously, it makes sense to look for other things like this - cases where the probability of something is being underestimated - because it seems too weird, because it's too unpleasant to think about, or for some other reason.

For example, metaculus currently estimates something like a 15% chance that Trump loses the election and refuses to concede. I we take that probability seriously, and assume something like that is likely to lead to riots, civil unrest, uncertainty, etc, would it make sense to try and trade on that? On the assumption that this is not priced in, because the possibility of this sort of crisis is not something that the market knows how to take seriously?

Comment by Eigil Rischel (eigil-rischel) on Tips/tricks/notes on optimizing investments · 2020-08-22T11:34:08.132Z · LW · GW
  • What are some reputable activist short-sellers?
  • Where do you go to identify Robinhood bubbles? (Maybe other than "lurk r/wallstreetbets and inverse whatever they're hyping").

I guess this question is really a general question about where you go for information about the market, in a general sense. Is it just reading a lot of "market news" type sites?

Comment by Eigil Rischel (eigil-rischel) on How much is known about the "inference rules" of logical induction? · 2020-08-10T20:20:27.657Z · LW · GW

Thank you very much!

I guess an argument of this type rules out a lot of reasonable-seeming inference rules - if a computable process can infer "too much" about universal statements from finite bits of evidence, you do this sort of Gödel argument and derive a contradiction. This makes a lot of sense, now that I think about it.

Comment by Eigil Rischel (eigil-rischel) on What are the best tools you have seen to keep track of knowledge around testable statements? · 2020-07-18T12:58:42.309Z · LW · GW

There is also predictionbook, which seems to be a similar sort of thing.

Of course, there's also metaculus, but that's more of a collaborative prediction aggregator, not so much a personal tool for tracking your own predictions.

Comment by Eigil Rischel (eigil-rischel) on We run the Center for Applied Rationality, AMA · 2020-07-16T12:37:48.297Z · LW · GW

If anyone came across this comment in the future - the CFAR Participant Handbook is now online, which is more or less the answer to this question.

Comment by Eigil Rischel (eigil-rischel) on Transportation as a Constraint · 2020-04-06T07:34:23.186Z · LW · GW

The Terra Ignota sci-fi series by Ada Palmer depicts a future world which is also driven by "slack transportation". The mechanism, rather than portals, is a super-cheap global network of autonomous flying cars (I think they're supposed to run on nuclear engines? The technical details are not really developed). It's a pretty interesting series, although it doesn't explore the practical implications so much as the political/sociological ones (and this is hardly the only thing driving the differences between the present world and the depicted future)

Comment by Eigil Rischel (eigil-rischel) on Category Theory Without The Baggage · 2020-02-03T23:01:08.503Z · LW · GW

I think, rather than "category theory is about paths in graphs", it would be more reasonable to say that category theory is about paths in graphs up to equivalence, and in particular about properties of paths which depend on their relations to other paths (more than on their relationship to the vertices)*. If your problem is most usefully conceptualized as a question about paths (finding the shortest path between two vertices, or counting paths, or something in that genre, you should definitely look to the graph theory literature instead)

* I realize this is totally incomprehensible, and doesn't make the case that there are any interesting problems like this. I'm not trying to argue that category theory is useful, just clarifying that your intuition that it's not useful for problems that look like these examples is right.

Comment by Eigil Rischel (eigil-rischel) on Category Theory Without The Baggage · 2020-02-03T22:05:10.767Z · LW · GW

As an algebraic abstractologist, let me just say this is an absolutely great post. My comments:

Category theorists don't distinguish between a category with two objects and an edge between them, and a category with two objects and two identified edges between them (the latter object doesn't really even make sense in the usual account). In general, the extra equivalence relation that you have to carry around makes certain things more complicated in this version.

I do tend to agree with you that thinking of categories as objects, edges and an equivalence relation on paths is a more intuitive perspective, but let me defend the traditional presentation. By far the most essential/prototypical examples are the categories of sets and functions, or types and functions. Here, it's more natural to speak of functions from x to y, than to speak of "composable sequences of functions beginning at x and ending at y, up to the equivalence relation which identifies two sequences if they have the same composite".

Again, I absolutely love this post. I am frankly a bit shocked that nobody seems to have written an introduction using this language - I think everyone is too enamored with sets as an example.

Comment by Eigil Rischel (eigil-rischel) on Underappreciated points about utility functions (of both sorts) · 2020-01-04T23:25:03.705Z · LW · GW

This is a reasonable way to resolve the paradox, but note that you're required to fix the max number of people ahead of time - and it can't change as you receive evidence (it must be a maximum across all possible worlds, and evidence just restricts the set of possible worlds). This essentially resolves Pascal's mugging by fixing some large number X and assigning probability 0 to claims about more than X people.

Comment by Eigil Rischel (eigil-rischel) on Underappreciated points about utility functions (of both sorts) · 2020-01-04T11:20:23.539Z · LW · GW

Just to sketch out the contradiction between unbounded utilities and gambles involving infinitely many outcomes a bit more explicitly.

If your probability function is unbounded, we can consider the following wager: You win 2 utils with probability 1/2, 4 utils with probability 1/4, and so on. The expected utility of this wager is infinite. (If there are no outcomes with utility exactly 2, 4, etc, we can award more - this is possible because utility is unbounded).

Now consider these wagers on a (fair) coinflip:

  • A: Play the above game if heads, pay out 0 utils if tails
  • B: Play the above game if heads, pay out 100000 utils if tails

(0 and 100000 can be any two non-equal numbers).

Both of these wagers have infinite expected utility, so we must be indifferent between them. But since they agree on heads, and B is strictly preferred to A on tails, we must prefer B (since tails occurs with positive probability)

Comment by Eigil Rischel (eigil-rischel) on We run the Center for Applied Rationality, AMA · 2019-12-20T20:59:41.955Z · LW · GW

Information about people behaving erratically/violently is better at grabbing your brain's "important" sensor? (Noting that I had exactly the same instinctual reaction). This seems to be roughly what you'd expect from naive evopsych (which doesn't mean it's a good explanation, of course)

Comment by Eigil Rischel (eigil-rischel) on We run the Center for Applied Rationality, AMA · 2019-12-19T23:04:40.477Z · LW · GW

CFAR must have a lot of information about the efficacy of various rationality techniques and training methods (compared to any other org, at least). Is this information, or recommendations based on it, available somewhere? Say, as a list of techniques currently taught at CFAR - which are presumably the best ones in this sense. Or does one have to attend a workshop to find out?

Comment by Eigil Rischel (eigil-rischel) on Examples of Causal Abstraction · 2019-12-16T10:51:13.026Z · LW · GW

There's some recent work in the statistics literature exploring similar ideas. I don't know if you're aware of this, or if it's really relevant to what you're doing (I haven't thought a lot about the comparisons yet), but here are some papers.

Comment by Eigil Rischel (eigil-rischel) on Eigil Rischel's Shortform · 2019-11-10T12:33:54.168Z · LW · GW

A thought about productivity systems/workflow optimization:

One principle of good design is "make the thing you want people to do, the easy thing to do". However, this idea is susceptible to the following form of Goodhart: often a lot of the value in some desirable action comes from the things that make it difficult.

For instance, sometimes I decide to migrate some notes from one note-taking system to another. This is usually extremely useful, because it forces me to review the notes and think about how they relate to each other and to the new system. If I make this easier for myself by writing a script to do the work (as I have sometimes done), this important value is lost.

Or think about spaced repetition cards: You can save a ton of time by reusing cards made by other people covering the same material - but the mental work of breaking the material down into chunks that can go into the spaced-repetition system, which is usually very important, is lost.

Comment by Eigil Rischel (eigil-rischel) on The best of the www, in my opinion · 2019-10-17T18:38:26.734Z · LW · GW

This is a great list.

The main criticism I have is that this list overlaps way too much with my own internal list of high-quality sites, making it not very useful.

Comment by Eigil Rischel (eigil-rischel) on Examples of Categories · 2019-10-10T10:55:37.363Z · LW · GW

The example of associativity seems a little strange, I'm note sure what's going on there. What are the three functions that are being composed?

Comment by Eigil Rischel (eigil-rischel) on Computational Model: Causal Diagrams with Symmetry · 2019-10-08T08:19:00.022Z · LW · GW

Should there be an arrow going from n*f(n-1) to f (around n==0?) ? The output of the system also depends on n*f(n-1), not just on whether or not n is zero.

Comment by Eigil Rischel (eigil-rischel) on Occam's Razor May Be Sufficient to Infer the Preferences of Irrational Agents: A reply to Armstrong & Mindermann · 2019-10-07T22:52:46.855Z · LW · GW

A simple remark: we don't have access to all of , only up until the current time. So we have to make sure that we don't get a degenerate pair which diverges wildly from the actual universe at some point in the future.

Maybe this is similar to the fact that we don't want AIs to diverge from human values once we go off-distribution? But you're definitely right that there's a difference: we do want AIs to diverge from human behaviour (even in common situations).

Comment by Eigil Rischel (eigil-rischel) on Two Dark Side Statistics Papers · 2019-10-02T18:14:04.667Z · LW · GW

I'm curious about the remaining 3% of people in the 97% program, who apparently both managed to smuggle some booze into rehab, and then admitted this to the staff while they were checking out. Lizardman's constant?

Comment by Eigil Rischel (eigil-rischel) on Eigil Rischel's Shortform · 2019-10-02T17:52:38.999Z · LW · GW

I've noticed a sort of tradeoff in how I use planning/todo systems (having experimented with several such systems recently). This mainly applies to planning things with no immediate deadline, where it's more about how to split a large amount of available time between a large number of tasks, rather than about remembering which things to do when. For instance, think of a personal reading list - there is no hurry to read any particular things on it, but you do want to be spending your reading time effectively.

On one extreme, I make a commitment to myself to do all the things on the list eventually. At first, this has the desired effect of making me get things done. But eventually, things that I don't want to do start to accumulate. I procrastinate on these things by working on more attractive items on the list. This makes the list much less useful from a planning perspective, since it's cluttered with a bunch of old things I no longer want to spend time on (which make me feel bad about not doing them whenever I'm looking at the list).

On the other extreme, I make no commitment like that, and remove things from the list whenever I feel like it. This avoids the problem of accumulating things I don't want to do, but makes the list completely useless as a tool for getting me to do boring tasks.

I have a hard time balancing these issues. I'm currently trying an approach to my academic reading list where I keep a mostly unsorted list, and whenever I look at it to find something to read, I have to work on the top item, or remove it from the list. This is hardly ideal, but it mitigates the "stale items" problem, and still manages to provide some motivation, since it feels bad to take items off the list.

Comment by Eigil Rischel (eigil-rischel) on What are your recommendations on books to listen to when doing, e.g., chores? · 2019-09-28T11:35:27.413Z · LW · GW

I found Predictably Irrational, Superforecasting, and Influence to be good.

Comment by Eigil Rischel (eigil-rischel) on Don't clean your glasses · 2019-09-24T08:55:39.056Z · LW · GW

I've managed to implement this for computer monitors, but not for glasses. But my glasses seem to get smudged frequently enough that I need to wipe them about every day anyways. I guess I fidget with them much more than you?