Posts

Announcing Encultured AI: Building a Video Game 2022-08-18T02:16:26.726Z
Encultured AI Pre-planning, Part 2: Providing a Service 2022-08-11T20:11:25.151Z
Encultured AI, Part 1 Appendix: Relevant Research Examples 2022-08-08T22:44:50.375Z
Encultured AI Pre-planning, Part 1: Enabling New Benchmarks 2022-08-08T22:44:09.365Z

Comments

Comment by Nick Hay (nickjhay) on Standard and Nonstandard Numbers · 2012-12-20T12:02:42.585Z · LW · GW

Very nice. These notes say that every countable nonstandard model of Peano arithmetic is isomorphic, as an ordered set, to the natural numbers followed by lexicographically ordered pairs (r, z) for r a positive rational and z an integer. If I remember rightly, the ordering can be defined in terms of addition: x <= y iff exists z. x+z <= y. So if we want to have a countable nonstandard model of Peano arithmetic with successor function and addition we need all these nonstandard numbers.

It seems that if we only care about Peano arithmetic with the successor function, then the naturals plus a single copy of the integers is a model. If I was trying to prove this, I'd think that just looking at the successor function, to any first-order predicate an element of the copy of the integers would be indistinguishable from a very large standard natural number, by standard FO locality results.

Comment by Nick Hay (nickjhay) on Standard and Nonstandard Numbers · 2012-12-20T07:00:03.336Z · LW · GW

Fascinating, I thought Tennanbaum's theorem implied non-standard models were rather impossible to visualize. The non-standard model of Peano arithmetic illustrated in the diagram only gives the successor relation, there's no definition of addition and multiplication. Tennenbaum's theorem implies there's no computable way to do this, but is there a proof that they can be defined at all for this particular model?

Comment by Nick Hay (nickjhay) on Review of Lakoff & Johnson, 'Philosophy in the Flesh' · 2011-11-07T00:21:24.562Z · LW · GW

The chapter on Chomsky is contrasting the generative grammar approach, which Lakoff used to work within, to the cognitive science inspired cognitive linguistics approach, which Lakoff has been working in for the last few decades. Cognitive linguistics includes cognitive semantics which is rather different to generative semantics.

Comment by Nick Hay (nickjhay) on Review of Lakoff & Johnson, 'Philosophy in the Flesh' · 2011-11-06T23:17:38.306Z · LW · GW

I largely agree with your critique, but more as a description of a different book that could have been written in this book's place. For example, a book on philosophy applying the results of this book's methodology, of which chapter 25 is a poor substitute. Or books drilling into one particular area in more detail with careful connections to the literature. This book serves better as an inspiring manifesto.

While these chapters are enlightening, they depend too heavily on the earlier account of metaphor, rarely draw upon other findings in cognitive science that are likely relevant, are sparse in scientific citations, and (as I've said) rarely cite actual philosophers claiming the things they say that philosophers claim.

Why is the dependence on the earlier theory of metaphor a problem?

Do you think the authors misrepresent what philosophers claim, in those chapters addressing philosophy (15-24) rather than (informal) philosophical ideas (9-14)?

Comment by Nick Hay (nickjhay) on Procedural Knowledge Gaps · 2011-02-08T07:38:10.518Z · LW · GW

If the goal in exercise is to lose weight, have you tried replacing carbohydrates with fat in your diet? Forcing yourself to exercise will serve to work up an appetite and make you hungry, but not to lose weight. There is a correlation between exercising and being thin, but the causality is generally perceived the wrong way around. There is also a correlation between exercising and (temporarily) losing weight, but that is confounded by diet changes which typically involving reducing carbohydrate intake.

I've heard you mention Gary Taube's work, but not that you've read it. If you haven't read his book he has a new shorter on which is well worth reading, linked here: http://www.garytaubes.com/2010/12/inanity-of-overeating/ The appendix has specific diet recommendations. Also good are these notes: http://higher-thought.net/complete-notes-to-good-calories-bad-calories/

Comment by Nick Hay (nickjhay) on Berkeley LW Meet-up Saturday November 6 · 2010-11-06T02:55:54.795Z · LW · GW

The T-rex is in the Valley Life Sciences Building. There's a few other fossils there too.

Comment by Nick Hay (nickjhay) on Fundamentally Flawed, or Fast and Frugal? · 2009-12-21T22:52:01.392Z · LW · GW

Idealized Bayesians don't have to be logically omniscient -- they can have a prior which assigns probability to logically impossible worlds.

Comment by Nick Hay (nickjhay) on Auckland meet up Saturday Nov 28th · 2009-11-15T07:26:07.738Z · LW · GW

I would be there, but I'm not back in NZ until 16th December! Everyone else should definitely go.

Comment by Nick Hay (nickjhay) on Expected utility without the independence axiom · 2009-10-29T02:08:58.398Z · LW · GW

The Von-Neumann Morgenstern axioms talk just about preference over lotteries, which are simply probability distributions over outcomes. That is you have an unstructured set O of outcomes, and you have a total preordering over Dist(O) the set of probability distributions over O. They do not talk about a utility function. This is quite elegant, because to make decisions you must have preferences over distributions over outcomes, but you don't need to assume that O has a certain structure, e.g. that of the reals.

The expected utility theorem says that preferences which satisfy the first four axioms are exactly those which can be represented by:

A <= B iff E[U;A] <= E[U;B]

for some utility function U: O -> R, where

E[U;A] = \sum{o} A(o) U(o)

However, U is only defined up to positive affine transformation i.e. aU+b will work equally well for any a>0. In particular, you can amplify the standard deviation as much as you like by redefining U.

Your axioms require you to pick a particular representation of U for them to make sense. How do you choose this U? Even with a mechanism for choosing U, e.g. assume bounded nontrivial preferences and pick the unique U such that \sup{x} U(x) = 1 and \inf{x} U(x) = 0, this is still less elegant than talking directly about lotteries.

Can you redefine your axioms to talk only about lotteries over outcomes?

Comment by Nick Hay (nickjhay) on Extreme risks: when not to use expected utility · 2009-10-23T22:35:07.495Z · LW · GW

To be concrete, suppose you want to maximise the average utility people have, but you also care about fairness so, all things equal, you prefer the utility to be clustered about its average. Then maybe your real utility function is not

U = (U[1] + .... + U[n])/n

but

U' = U + ((U[1]-U)^2 + .... + (U[n]-U)^2)/n

which is in some sense a mean minus a variance.

Comment by Nick Hay (nickjhay) on Extreme risks: when not to use expected utility · 2009-10-23T22:24:33.927Z · LW · GW

Can you translate your complaint into a problem with the independence axiom in particular?

Your second example is not a problem of variance in final utility, but aggregation of utility. Utility theory doesn't force "Giving 1 util to N people" to be equivalent to "Giving N util to 1 person". That is, it doesn't force your utility U to be equal to U1 + U2 + ... + UN where Ui is the "utility for person i".

Comment by Nick Hay (nickjhay) on Nonparametric Ethics · 2009-06-21T22:23:36.050Z · LW · GW

Your use of the terms parametric vs. nonparametric doesn't seem to be that used by people working in nonparametric Bayesian statistics, where the distinction is more like whether your statistical model has a fixed finite number of parameters or has no such bound. Methods such as Dirichlet processes, and its many variants (Hierarchical DP, HDP-HMM, etc), go beyond simple modeling of surface similarities using similarity of neighbours.

See, for example, this list of publications coauthored by Michael Jordan:

Comment by Nick Hay (nickjhay) on That You'd Tell All Your Friends · 2009-03-02T02:28:44.393Z · LW · GW

Thou Art Godshatter: gives an intuitive grasp for why and how human morality is complex, but that not any complex thing will do.

Comment by Nick Hay (nickjhay) on Issues, Bugs, and Requested Features · 2009-02-28T08:49:09.959Z · LW · GW

How about buttons "High quality", "Low quality", "Accurate", "Inaccurate". We're increasing options here, but there's probably a nice way to design the interface to reduce the cognitive load.

Using the word "vote" seems broken here more generally -- we aren't implementing some democratic process, we're aggregating judgments (read: collecting evidence) across a population.

Comment by Nick Hay (nickjhay) on Issues, Bugs, and Requested Features · 2009-02-28T08:44:52.888Z · LW · GW

Because quality and truth are separate judgments in practice, and forcing them to be conflated into a single scale is losing information. To the extent that truth is positively correlated with quality this will fall out automatically: highly truthy posts will tend to have high quality. Low quality and high truth are not opposites.

Comment by Nick Hay (nickjhay) on The Thing That I Protect · 2009-02-08T05:03:26.000Z · LW · GW

Z. M. Davis: Good point, I was brushing that distinction under the rug. From this perspective all people arguing about values are trying to change someone's value computation, to a greater or lesser degree i.e. this is not the place to look if you want to discriminate between "liberal" and "conservative".

With the obvious way to implement a CEV, you start by modeling a population of actual humans (e.g. Earth's), then consider extrapolations of these models (know more, thought faster, etc). No "wipe culturally-defined values" step, however that would be defined.

Where was it suggested otherwise?

Comment by Nick Hay (nickjhay) on The Thing That I Protect · 2009-02-08T03:53:07.000Z · LW · GW

Ian C: neither group is changing human values as it is referred to here: everyone is still human, no one is suggesting neurosurgery to change how brains compute value. See the post value is fragile.

Comment by Nick Hay (nickjhay) on Continuous Improvement · 2009-01-11T23:56:30.000Z · LW · GW

Interestingly, you can have unboundedly many children with only quadratic population growth, so long as they are exponentially spaced. For example, give each newborn sentient a resource token, which can be used after the age of maturity (say, 100 years or so) to fund a child. Additionally, in the years 2^i every living sentient is given an extra resource token. One can show there is at most quadratic growth in the number of resource tokens. By adjusting the exponent in 2^i we can get growth O(n^{1+p}) for any nonnegative real p.

Comment by Nick Hay (nickjhay) on What I Think, If Not Why · 2008-12-12T02:57:00.000Z · LW · GW

Phil: Yes. CEV completely replaces and overwrites itself, by design. Before this point it does not interact with the external world to change it in a significant sense (it cannot avoid all change; e.g. its computer will add tiny vibrations to the Earth, as all computers do). It executes for a while then overwrites itself with a computer program (skipping every intermediate step here). By default, and if anything goes wrong, this program is "shutdown silently, wiping the AI system clean."

(When I say "CEV" I really mean a FAI which satisfies the spirit behind the extremely partial specification given in the CEV document. The CEV document says essentially nothing of how to implement this specification.)

Comment by Nick Hay (nickjhay) on The Nature of Logic · 2008-11-18T04:56:55.000Z · LW · GW

Personally, I prefer the longer posts.

Comment by Nick Hay (nickjhay) on Expected Creative Surprises · 2008-10-25T08:53:59.000Z · LW · GW

guest: right, so with those definitions you are overconfident if you are suprised more than you expected, underconfident if you are suprised less, calibration being how close your suprisal is to your expectation of it.

Comment by Nick Hay (nickjhay) on Expected Creative Surprises · 2008-10-25T08:03:47.000Z · LW · GW

I think there's a sign error in my post -- C(x0) = \log p(x0) + H(p) it should be.

Comment by Nick Hay (nickjhay) on Expected Creative Surprises · 2008-10-25T08:00:23.000Z · LW · GW

Anon: no, I mean the log probability. In your example, the calibratedness will generally be high: - \log 0.499 - H(p) ~= 0.00289 each time you see tails, and - log 0.501 - H(p) ~= - 0.00289 each time you come up tails. It's continuous.

Let's be specific. We have H(p) = - \sum_x p(x) \log p(x), where p is some probability distribution over a finite set. If we observe x0, the say the predictor's calibration is

C(x0) = \sum_x p(x) \log p(x) - \log p(x0) = - \log p(x0) - H(p)

so the expected calibration is 0 by the definition of H(p). The calibration is continuous in p. If \log p(x0) is higher then the expected value of \log p(x) then we are underconfident and C(x0) < 0; if \log p(x0) is lower than expected we are overconfident, and C>0.

With q = p(x) d(x,x0) the non-normalised probability distribution that assigns value only x0, we have

C = D(p||q)

so this is a relative entropy of sorts.

Comment by Nick Hay (nickjhay) on Expected Creative Surprises · 2008-10-25T03:37:23.000Z · LW · GW

Anon: well-calibrated means roughly that in the class of all events you think have probability p to being true, the proportion of them that turn out to be true is p.

More formally, suppose you have a probability distribution over something you are going to observe. If the log probability of the event which actually occurs is equal to the entropy of your distribution, you are well calibrated. If it is above you are over confident, if it is below you are under confident. By this measure, assigning every possibility equal probability will always be calibrated.

This is related to relative entropy.

Comment by Nick Hay (nickjhay) on The Quantum Arena · 2008-04-20T01:57:45.000Z · LW · GW

Just in case it's not clear from the above: there are uncountably many degrees of freedom to an arbitrary complex function on the real line, since you can specify its value at each point independently.

A continuous function, however, has only countably many degrees of freedom: it is uniquely determined by its values on the rational numbers (or any dense set).

Comment by Nick Hay (nickjhay) on Thou Art Godshatter · 2007-11-14T01:59:02.000Z · LW · GW

Eliezer: poetic and informative. I like it.

Comment by Nick Hay (nickjhay) on How to Seem (and Be) Deep · 2007-10-16T22:43:00.000Z · LW · GW

Tiiba:

The hypothesis is actual immortality, to which nonzero probability is being assigned. For example, suppose under some scenario your probability of dying at each time decreases by a factor of 1/2. Then, your total probability of dying is 2 times the probability of dying at the very first step, which we can assume far less than 1/2.

Comment by Nick Hay (nickjhay) on Cached Thoughts · 2007-10-13T15:36:38.000Z · LW · GW

Felix: Yes, for example see http://en.wikipedia.org/wiki/NC_%28complexity%29

Comment by Nick Hay (nickjhay) on A Priori · 2007-10-10T21:44:06.000Z · LW · GW

Eliezer: "You could see someone else's engine operating materially, through material chains of cause and effect, to compute by "pure thought" that 1 + 1 = 2. How is observing this pattern in someone else's brain any different, as a way of knowing, from observing your own brain doing the same thing? When "pure thought" tells you that 1 + 1 = 2, "independently of any experience or observation", you are, in effect, observing your own brain as evidence."

Richard: "It's just fundamentally mistaken to conflate reasoning with "observing your own brain as evidence"."

Eliezer: "If you view it as an argument, yes. The engines yield the same outputs."

Richard: "What does the latter have to do with rationality?"

Pure thought is something your brain does. If you consider having successfully determined a conclusion from pure thought evidence that that thought is correct, then you must consider the output of your brain (i.e. its, that is your, internal representation of this conclusion) as valid evidence for the conclusion. Otherwise you have no reason to trust your conclusion is correct, because this conclusion is exactly the output of your brain after reasoning.

If you consider your own brain as evidence, and someone else's brain works in the same way, computing the same answers as yours, observing their brain is the same as observing your brain is the same as observing your own thoughts. You could know abstractly that "Bob, upon contempating X for 10 minutes, would consider it a priori true iff I would", perhaps from knowledge of both of your brains compute whether something is a priori true. If you then found out that "Bob thinks X a priori true" you could derive that X was a priori true without having to think about it: you know your output would be the same ("X is a priori true") without having to determine it.

Comment by Nick Hay (nickjhay) on Conservation of Expected Evidence · 2007-08-13T23:08:03.000Z · LW · GW

One reason is Cox's theorem, which shows any quantitative measure of plausibility must obey the axioms of probability theory. Then this result, conservation of expected evidence, is a theorem.

What is the "confidence level"? Why is 50% special here?

Comment by Nick Hay (nickjhay) on Conservation of Expected Evidence · 2007-08-13T21:55:16.000Z · LW · GW

Perhaps this formulation is nice:

0 = (P(H|E)-P(H))P(E) + (P(H|~E)-P(H))P(~E)

The expected change in probability is zero (for if you expected change you would have already changed).

Since P(E) and P(~E) are both positive, to maintain balance if P(H|E)-P(H) < 0 then P(H|~E)-P(H) > 0. If P(E) is large then P(~E) is small, so (P(H|~E)-P(H)) must be large to counteract (P(H|E)-P(H)) and maintain balance.

Comment by Nick Hay (nickjhay) on Chronophone Motivations · 2007-03-25T03:47:47.000Z · LW · GW

It seems the point of the exercise is to think of non-obvious cognitive strategies, ways of thinking, for improving things. The chronophone translation is both a tool both for finding these strategies by induction, and a rationality test to see if the strategies are sufficiently unbiased and meta.

But what would I say? The strategy of searching for and correcting biases in thought, failures of rationality, would improve things. But I think I generated that suggestion by thinking of "good ideas to transmit" which isn't meta enough. Perhaps if I discussed various biases I was concerned about, gave a stream of thought analysis of how to improve a particular bias (say, anthropomorphism), this would be invoking the strategy rather than referencing it, thus passing the filter. Hmmm.