Comment by nick_hay on Standard and Nonstandard Numbers · 2012-12-20T12:02:42.585Z · score: 3 (3 votes) · LW · GW

Very nice. These notes say that every countable nonstandard model of Peano arithmetic is isomorphic, as an ordered set, to the natural numbers followed by lexicographically ordered pairs (r, z) for r a positive rational and z an integer. If I remember rightly, the ordering can be defined in terms of addition: x <= y iff exists z. x+z <= y. So if we want to have a countable nonstandard model of Peano arithmetic with successor function and addition we need all these nonstandard numbers.

It seems that if we only care about Peano arithmetic with the successor function, then the naturals plus a single copy of the integers is a model. If I was trying to prove this, I'd think that just looking at the successor function, to any first-order predicate an element of the copy of the integers would be indistinguishable from a very large standard natural number, by standard FO locality results.

Comment by nick_hay on Standard and Nonstandard Numbers · 2012-12-20T07:00:03.336Z · score: 2 (2 votes) · LW · GW

Fascinating, I thought Tennanbaum's theorem implied non-standard models were rather impossible to visualize. The non-standard model of Peano arithmetic illustrated in the diagram only gives the successor relation, there's no definition of addition and multiplication. Tennenbaum's theorem implies there's no computable way to do this, but is there a proof that they can be defined at all for this particular model?

Comment by nick_hay on Review of Lakoff & Johnson, 'Philosophy in the Flesh' · 2011-11-07T00:21:24.562Z · score: 1 (1 votes) · LW · GW

The chapter on Chomsky is contrasting the generative grammar approach, which Lakoff used to work within, to the cognitive science inspired cognitive linguistics approach, which Lakoff has been working in for the last few decades. Cognitive linguistics includes cognitive semantics which is rather different to generative semantics.

Comment by nick_hay on Review of Lakoff & Johnson, 'Philosophy in the Flesh' · 2011-11-06T23:17:38.306Z · score: 0 (0 votes) · LW · GW

I largely agree with your critique, but more as a description of a different book that could have been written in this book's place. For example, a book on philosophy applying the results of this book's methodology, of which chapter 25 is a poor substitute. Or books drilling into one particular area in more detail with careful connections to the literature. This book serves better as an inspiring manifesto.

While these chapters are enlightening, they depend too heavily on the earlier account of metaphor, rarely draw upon other findings in cognitive science that are likely relevant, are sparse in scientific citations, and (as I've said) rarely cite actual philosophers claiming the things they say that philosophers claim.

Why is the dependence on the earlier theory of metaphor a problem?

Do you think the authors misrepresent what philosophers claim, in those chapters addressing philosophy (15-24) rather than (informal) philosophical ideas (9-14)?

Comment by nick_hay on Procedural Knowledge Gaps · 2011-02-08T07:38:10.518Z · score: 3 (3 votes) · LW · GW

If the goal in exercise is to lose weight, have you tried replacing carbohydrates with fat in your diet? Forcing yourself to exercise will serve to work up an appetite and make you hungry, but not to lose weight. There is a correlation between exercising and being thin, but the causality is generally perceived the wrong way around. There is also a correlation between exercising and (temporarily) losing weight, but that is confounded by diet changes which typically involving reducing carbohydrate intake.

I've heard you mention Gary Taube's work, but not that you've read it. If you haven't read his book he has a new shorter on which is well worth reading, linked here: The appendix has specific diet recommendations. Also good are these notes:

Comment by nick_hay on Berkeley LW Meet-up Saturday November 6 · 2010-11-06T02:55:54.795Z · score: 0 (0 votes) · LW · GW

The T-rex is in the Valley Life Sciences Building. There's a few other fossils there too.

Comment by nick_hay on Fundamentally Flawed, or Fast and Frugal? · 2009-12-21T22:52:01.392Z · score: 1 (1 votes) · LW · GW

Idealized Bayesians don't have to be logically omniscient -- they can have a prior which assigns probability to logically impossible worlds.

Comment by nick_hay on Auckland meet up Saturday Nov 28th · 2009-11-15T07:26:07.738Z · score: 0 (0 votes) · LW · GW

I would be there, but I'm not back in NZ until 16th December! Everyone else should definitely go.

Comment by nick_hay on Expected utility without the independence axiom · 2009-10-29T02:08:58.398Z · score: 8 (8 votes) · LW · GW

The Von-Neumann Morgenstern axioms talk just about preference over lotteries, which are simply probability distributions over outcomes. That is you have an unstructured set O of outcomes, and you have a total preordering over Dist(O) the set of probability distributions over O. They do not talk about a utility function. This is quite elegant, because to make decisions you must have preferences over distributions over outcomes, but you don't need to assume that O has a certain structure, e.g. that of the reals.

The expected utility theorem says that preferences which satisfy the first four axioms are exactly those which can be represented by:

A <= B iff E[U;A] <= E[U;B]

for some utility function U: O -> R, where

E[U;A] = \sum{o} A(o) U(o)

However, U is only defined up to positive affine transformation i.e. aU+b will work equally well for any a>0. In particular, you can amplify the standard deviation as much as you like by redefining U.

Your axioms require you to pick a particular representation of U for them to make sense. How do you choose this U? Even with a mechanism for choosing U, e.g. assume bounded nontrivial preferences and pick the unique U such that \sup{x} U(x) = 1 and \inf{x} U(x) = 0, this is still less elegant than talking directly about lotteries.

Can you redefine your axioms to talk only about lotteries over outcomes?

Comment by nick_hay on Extreme risks: when not to use expected utility · 2009-10-23T22:35:07.495Z · score: 1 (1 votes) · LW · GW

To be concrete, suppose you want to maximise the average utility people have, but you also care about fairness so, all things equal, you prefer the utility to be clustered about its average. Then maybe your real utility function is not

U = (U[1] + .... + U[n])/n


U' = U + ((U[1]-U)^2 + .... + (U[n]-U)^2)/n

which is in some sense a mean minus a variance.

Comment by nick_hay on Extreme risks: when not to use expected utility · 2009-10-23T22:24:33.927Z · score: 3 (3 votes) · LW · GW

Can you translate your complaint into a problem with the independence axiom in particular?

Your second example is not a problem of variance in final utility, but aggregation of utility. Utility theory doesn't force "Giving 1 util to N people" to be equivalent to "Giving N util to 1 person". That is, it doesn't force your utility U to be equal to U1 + U2 + ... + UN where Ui is the "utility for person i".

Comment by nick_hay on Nonparametric Ethics · 2009-06-21T22:23:36.050Z · score: 6 (6 votes) · LW · GW

Your use of the terms parametric vs. nonparametric doesn't seem to be that used by people working in nonparametric Bayesian statistics, where the distinction is more like whether your statistical model has a fixed finite number of parameters or has no such bound. Methods such as Dirichlet processes, and its many variants (Hierarchical DP, HDP-HMM, etc), go beyond simple modeling of surface similarities using similarity of neighbours.

See, for example, this list of publications coauthored by Michael Jordan:

Comment by nick_hay on That You'd Tell All Your Friends · 2009-03-02T02:28:44.393Z · score: 6 (6 votes) · LW · GW

Thou Art Godshatter: gives an intuitive grasp for why and how human morality is complex, but that not any complex thing will do.

Comment by nick_hay on Issues, Bugs, and Requested Features · 2009-02-28T08:49:09.959Z · score: 6 (6 votes) · LW · GW

How about buttons "High quality", "Low quality", "Accurate", "Inaccurate". We're increasing options here, but there's probably a nice way to design the interface to reduce the cognitive load.

Using the word "vote" seems broken here more generally -- we aren't implementing some democratic process, we're aggregating judgments (read: collecting evidence) across a population.

Comment by nick_hay on Issues, Bugs, and Requested Features · 2009-02-28T08:44:52.888Z · score: 5 (5 votes) · LW · GW

Because quality and truth are separate judgments in practice, and forcing them to be conflated into a single scale is losing information. To the extent that truth is positively correlated with quality this will fall out automatically: highly truthy posts will tend to have high quality. Low quality and high truth are not opposites.

Comment by nick_hay on The Thing That I Protect · 2009-02-08T05:03:26.000Z · score: 1 (1 votes) · LW · GW

Z. M. Davis: Good point, I was brushing that distinction under the rug. From this perspective all people arguing about values are trying to change someone's value computation, to a greater or lesser degree i.e. this is not the place to look if you want to discriminate between "liberal" and "conservative".

With the obvious way to implement a CEV, you start by modeling a population of actual humans (e.g. Earth's), then consider extrapolations of these models (know more, thought faster, etc). No "wipe culturally-defined values" step, however that would be defined.

Where was it suggested otherwise?

Comment by nick_hay on The Thing That I Protect · 2009-02-08T03:53:07.000Z · score: 1 (1 votes) · LW · GW

Ian C: neither group is changing human values as it is referred to here: everyone is still human, no one is suggesting neurosurgery to change how brains compute value. See the post value is fragile.

Comment by nick_hay on Continuous Improvement · 2009-01-11T23:56:30.000Z · score: 4 (4 votes) · LW · GW

Interestingly, you can have unboundedly many children with only quadratic population growth, so long as they are exponentially spaced. For example, give each newborn sentient a resource token, which can be used after the age of maturity (say, 100 years or so) to fund a child. Additionally, in the years 2^i every living sentient is given an extra resource token. One can show there is at most quadratic growth in the number of resource tokens. By adjusting the exponent in 2^i we can get growth O(n^{1+p}) for any nonnegative real p.

Comment by nick_hay on What I Think, If Not Why · 2008-12-12T02:57:00.000Z · score: 1 (1 votes) · LW · GW

Phil: Yes. CEV completely replaces and overwrites itself, by design. Before this point it does not interact with the external world to change it in a significant sense (it cannot avoid all change; e.g. its computer will add tiny vibrations to the Earth, as all computers do). It executes for a while then overwrites itself with a computer program (skipping every intermediate step here). By default, and if anything goes wrong, this program is "shutdown silently, wiping the AI system clean."

(When I say "CEV" I really mean a FAI which satisfies the spirit behind the extremely partial specification given in the CEV document. The CEV document says essentially nothing of how to implement this specification.)

Comment by nick_hay on The Nature of Logic · 2008-11-18T04:56:55.000Z · score: 2 (2 votes) · LW · GW

Personally, I prefer the longer posts.

Comment by nick_hay on Expected Creative Surprises · 2008-10-25T08:53:59.000Z · score: 1 (1 votes) · LW · GW

guest: right, so with those definitions you are overconfident if you are suprised more than you expected, underconfident if you are suprised less, calibration being how close your suprisal is to your expectation of it.

Comment by nick_hay on Expected Creative Surprises · 2008-10-25T08:03:47.000Z · score: 1 (1 votes) · LW · GW

I think there's a sign error in my post -- C(x0) = \log p(x0) + H(p) it should be.

Comment by nick_hay on Expected Creative Surprises · 2008-10-25T08:00:23.000Z · score: 1 (1 votes) · LW · GW

Anon: no, I mean the log probability. In your example, the calibratedness will generally be high: - \log 0.499 - H(p) ~= 0.00289 each time you see tails, and - log 0.501 - H(p) ~= - 0.00289 each time you come up tails. It's continuous.

Let's be specific. We have H(p) = - \sum_x p(x) \log p(x), where p is some probability distribution over a finite set. If we observe x0, the say the predictor's calibration is

C(x0) = \sum_x p(x) \log p(x) - \log p(x0) = - \log p(x0) - H(p)

so the expected calibration is 0 by the definition of H(p). The calibration is continuous in p. If \log p(x0) is higher then the expected value of \log p(x) then we are underconfident and C(x0) < 0; if \log p(x0) is lower than expected we are overconfident, and C>0.

With q = p(x) d(x,x0) the non-normalised probability distribution that assigns value only x0, we have

C = D(p||q)

so this is a relative entropy of sorts.

Comment by nick_hay on Expected Creative Surprises · 2008-10-25T03:37:23.000Z · score: 3 (3 votes) · LW · GW

Anon: well-calibrated means roughly that in the class of all events you think have probability p to being true, the proportion of them that turn out to be true is p.

More formally, suppose you have a probability distribution over something you are going to observe. If the log probability of the event which actually occurs is equal to the entropy of your distribution, you are well calibrated. If it is above you are over confident, if it is below you are under confident. By this measure, assigning every possibility equal probability will always be calibrated.

This is related to relative entropy.

Comment by nick_hay on How to Seem (and Be) Deep · 2007-10-16T22:43:00.000Z · score: 1 (1 votes) · LW · GW


The hypothesis is actual immortality, to which nonzero probability is being assigned. For example, suppose under some scenario your probability of dying at each time decreases by a factor of 1/2. Then, your total probability of dying is 2 times the probability of dying at the very first step, which we can assume far less than 1/2.