Colonization models: a tutorial on computational Bayesian inference (part 2/2)

post by snarles · 2011-05-17T03:54:29.547Z · LW · GW · Legacy · 6 comments

Contents

  Recap
  Section III.  Inferential Methodology
  Section IV.  Implementation and Demonstration
  Section V. Prior Specification; MAP estimation
None
6 comments

Recap

Part 1 was a tutorial for programming a simulation for the emergence and development of intelligent species in a universe 'similar to ours.'  In part 2, we will use the model developed in part 1 to evaluate different explanations of the Fermi paradox. However, keep in mind that the purpose of this two-part series is for showcasing useful methods, not for obtaining serious answers.

We summarize the model given in part 1:

SIMPLE MODEL FOR THE UNIVERSE

 

Section III.  Inferential Methodology

In this section, no apologies are made for assuming that the reader has a solid grasp of the principles of Bayesian reasoning.  Those currently following the tutorial from Part 1 may find it a good idea to skip to Section IV first.

To dodge the philosophical controversies surrounding anthropic reasoning, we will employ an impartial observer model.  Like Jaynes, we introduce a robot which is capable of Bayesian reasoning, but here we imagine a model in which such a robot is instantaneously created and randomly injected into the universe at a random point in space, and at a random time point chosen uniformly from 1 to 1000 (and the robot is aware that it is created via this mechanism).  We limit ourselves to asking what kind of inferences this robot would make in a given situation.  Interestingly, the inferences made by this robot will turn out to be quite similar to the inferences that would be made under the self-indication assumption.

It is important to precisely specify the observational powers of this robot.  The robot can estimate the age of the universe, but it cannot determine its location relative to the center of the universe.  The robot can visually detect any Type II+ civilizations within its past light cone.  Furthermore, it we specify that the robot can detect Type 0 civilizations which are within a small distance u of the robot.  We also grant the robot the power to deduce the age of any such nearby Type 0 civilizations.

(In fact, we grant observational powers to this robot in order so that it has close to the same observational powers that we have.  But for now, in order to steer clear of the philosophers, we carry on innocently and say no further.)

Now, at the time of its creation, the robot so happens to notice the following data.

Data D :

The robot is aware that the universe runs according to the 'simple model of the universe,' but it does not know the values for parameters a, b, c, d, e, and k.  Given a hypothesis H as to the values of those parameters, the robot can calculate the posterior probability of observing data D.

We will encode the entire history of the universe in the form of a random variable G which takes values within the space of all possible histories. Even after we fix a particular history, the space-time location of the robot within this history is still random.  This will be an important point to remember when doing inference under this impartial observer model.

Define random variable N to be the number of new Type 0 civilizations which are outside the sphere of visibility of any type II+ civilizations in existence at time t = 500. Since in our model only one civilization can appear in a time step, the random variable N is in fact a binary random variable. Conditioned on G taking a particular value g, the random variable N is a fixed number. Conditioned on a particular history g, the probability that the robot appears within distance u of such a civilization is P(D2|H, D1, G=g) = (u^3) (N|H, D1, G=g) = (u^3) (N|H, G=g).  We have N|D1 = N since the definition of N does not depend on the current time step.

The robot calculates the posterior probability of the data as

P(D|H) = P(D2|H, D1) P(D1|H) = P(D2|H, D1) (1/1000) = E[P(D2|H,D1,G)]/1000

= E[(u^3) (N|H, D1, G)]/1000 = (u^3)/1000 E[N|H]

Now, supposing we wanted to compare different hypotheses H1, H2, etc., their posterior probabilities would all share the common constant term (u^3)/1000 in front.  Since only the ratios of the posterior probabilities are relevant for inference, we drop the constant term and simply remember that

P(D|H) is proportional to E[N|H]

All we need to do, then, to evaluate the plausibility of different configurations H1, H2, etc. of the parameters, is to compute the value of E[N|H].

However, in this case, conditioning on a uniform distribution on a small interval of time t=[495, 505] is computationally inconvenient, because very few of the histories will have a new Type 0 civilization at those specific times.  This means that many simulations have to be performed before we can get an approximation of E[N|H] to the necessary degree of accuracy.

We can make our problem more computationally tractable if we weaken the observational powers of our robot; hopefully, our resulting inferences will not be altered too much by this change.  (Readers are invited to check this by implementing the inferences for the original setup.) In the modified problem the robot is unable to determine the age of the universe, nor the age any Type 0 civilization it can detect.  The robot observes only the following data.

Data D :

For the modified problem, we redefine the random variable N to be the number of Type 0 civilizations outside the sphere of visibility of any Type II+ civilization at random time T.  Similar to before, we have

P(D|H) = u^3 E[E[N|H]|T]

or

P(D|H) proportional to E[E[N|H]|T]

What is interesting about this conclusion is that it is similar to the Self-Indication Assumption, which, to quote the linked Wikipedia page, is the principle:

Given the fact that you exist, you should (other things equal) favor hypotheses according to which many observers exist over hypotheses on which few observers exist.

Whereas here, from Bayes' rule, we have:

Given the fact that a civilization exists satisfying CRITERIA X, the robot should (other things equal) favor hypotheses according to which many such civilizations exist over hypotheses on which few of them exist at randomly determined time T

 

Section IV.  Implementation and Demonstration

We give the code for calculating E[E[N|H]|T] below.

# Code Block 11
no_visibles = function(civs,x) {
  no_civs = length(civs)
  if (no_civs == 0) {
    return(TRUE)
  }
  for (i in 1:no_civs) {
    current_civ = civs[[i]]
    y = current_civ$location
    if (distance(x,y) < current_civ$r_v) {
      return(FALSE)
    }
  }
  return(TRUE)
}
value_n = function(history) {
  sum = 0
  for (t in 1:1000) {
    civs = history[[t]]
    no_civs = length(civs)
    if (no_civs > 0) {
      for (i in 1:no_civs) {
        civ = civs[[i]]
        if (civ$type == 0) {
          x = civ$location
          if (no_visibles(civs, x)) {
            sum = sum + 1
          }
        }
      }
    }
  }
  sum/1000
}
multiple_trials_value_n = function(reps=10) {
  values = numeric(0) # empty vector
  for (i in 1:reps) {
    history=simulation()
    values[i] = value_n(history)
  }
  values
}
reps = 10
values = multiple_trials_value_n(reps)
# approximate value of E[[N|H]|T]
mean(values)
# standard error of the approximation
sqrt(var(values)/reps)

For those who followed part 1, the above code should be straightforward.  At the end we calculate a standard error since we have approximately the value of E[E[N|H]|T] by averaging over random samples of the random variable E[N|H]|T.

Now we can get onto the business of comparing Bayes factors for hypotheses.  Here we take the term 'Bayes' factor to refer to the quantity P(D|H). Consider the following example hypotheses.

1) H1: "we are alone in the universe" (small chance of life arising)

# Hypothesis 1
a = 0.001; b = 0; c = 0.1; d = 0; e = 0; k = 0

2) H2: "great filter ahead" (high chance of life, but high chance of Type 0's self-destructing)

# Hypothesis 2
a = 0.1; b = 0.1; c = 0.01; d = 0; e = 0; k = 0

3) H3: "burning the cosmic commons" (high chance of life and of surviving to stage III, but high speed of colonization)

# Hypothesis 3
a = 0.1; b = 0; c = 0; d = 0.1; e = 0.1; k = 0.9

EXERCISE: Run Code Block 11 to calculate the Bayes factors for the above hypotheses.

EXERCISE 2: Can you think of a hypothesis which maximizes the Bayes Factor?

EXERCISE: After assigning prior weights to hypotheses of your choosing, modify the code in order to calculate the posterior probability that a Type 0 civ observing the Fermi paradox ultimately:

 

Section V. Prior Specification; MAP estimation

SOLUTION TO EXERCISE 2:  We can maximize the number of Type 0 observers of the Fermi paradox, naturally, by setting b, c, d to zero so that Type 0 civilizations remain Type 0 civilizations forever, and no Type II+ civilizations ever appear.

This 'maximum likelihood' solution is of little value to us, however, since we would probably not give much prior weight to a model in which all Type 0 civilizations survive forever and never go interstellar.  Recall that quantity we are actually interested in, the posterior probability of a hypothesis, depends on both the Bayes factor and the prior weight of the hypothesis:

P(H|D) proportional to P(D|H) P(H)

The parameters H which maximizes P(H|D) is called the Maximum A Posteriori (MAP) estimator.

The business of determining prior weights of hypotheses, however, is a complicated affair.  How can we faithfully represent the state of our knowledge of the propensity for intelligent life to evolve and develop in the universe?  [Actually, we are still operating under the ruse that we are assigning priors for our robot, not ourselves, but since in the end we only care about the inferences made by the robot insofar as the similarity of our priors and the robot's priors, we might as well make the robot's priors match ours as closely as possible.]

As is the custom we will seek convenient mathematical formulations for the prior distributions.  As a guide for coming up with priors, we note:

In fact, the above analysis suggests that it may be more natural to assign priors on the transformed parameters:

a0 = 1/a, b0 = 1/b, c0 = 1/(c+d), d0 = d/(c+d), e0 = 1/e.

Meanwhile, if we want to restrict k to lie between 0 and 1 and vanish on the endpoints, we can do this by assigning a prior to the transformed parameter

k0 = log(k) - log(1-k)

From these transformed parameters, we reconstitute the original parameters by

a = 1/a0, b = 1/b0, c = (1-d0)/c0, d = d0/c0, e = 1/e0, k = 1/(1 + exp(-k0))

Since a0, b0, c0, e0 have to be positive, natural priors for these would the exponential.  d0 might have a Beta prior.  We also might as well take k0 to have a normally distributed prior.

All of these priors would be parameterized by 'hyperparameters,' which, for lack of imagination and lack of Greek symbols, we will call a1, b1, c1, d1_alpha, d1_beta e1, k1 at the risk of some confusion.

Once we have specified priors for the parameters, the very next thing a Bayesian wants to do is to compute the posterior density of the parameters.  In our case, we will estimate the posterior by directly drawing from the posterior density.

[to be contd.]

6 comments

Comments sorted by top scores.

comment by lukeprog · 2011-05-17T23:21:44.970Z · LW(p) · GW(p)

I'm not sure how others feel, but I would like to see more of this kind of thing.

Replies from: Dreaded_Anomaly
comment by Dreaded_Anomaly · 2011-05-17T23:28:21.737Z · LW(p) · GW(p)

I agree.

Replies from: None
comment by [deleted] · 2011-05-17T23:53:41.491Z · LW(p) · GW(p)

Thirded.

comment by PhilGoetz · 2011-05-18T03:17:32.138Z · LW(p) · GW(p)

I love concrete analysis like this.

I don't understand why we are interested in the observation of a robot created somewhere at random, instead of the observations made from an existing type 0 civilization.

comment by snarles · 2011-05-17T16:13:55.838Z · LW(p) · GW(p)

Currently thinking about the best way to get the credibility regions/MAP estimates.

If anyone wants to do a serious analysis using this type of simulation model, it's possible to code a much more efficient and elegant simulation by making use of continuous time--having the data structure store times and locations of events rather than discrete snapshots. However, the continuous-time model would be harder to tinker with, so I opted for the more transparent discrete-time model for this tutorial.

Replies from: Cyan
comment by Cyan · 2011-05-18T01:14:38.722Z · LW(p) · GW(p)

The package "feature" on CRAN does multivariate density estimation. You might be able to use that for MAP estimation by running it on the posterior samples, and it also might be useful for HPD credible regions.