Preference utilitarian measure of historical welfare

post by taw · 2010-04-14T13:32:21.158Z · LW · GW · Legacy · 25 comments

GDP measures essentially how good we are at making widgets - and while widgets are useful, it is a very weak and indirect measure of welfare. For example UK GDP per capita doubled between 1975 and 2007 - and people's quality of life indeed improved - but it would be extremely difficult to argue that this improvement was "doubling", and that the gap between 2007's and 1975's quality of life is greater than between 1975's and hunter-gatherer times.

It's not essential to this post, but my very quick theory is that we overestimate GDP thanks to economic equivalent of Amdahl's Law - if someone's optimal consumption mix consisted of 9 units of widgets and 1 unit of personalized services - and their purchasing power increased so now they can acquire 100x as many widgets, but still the same number of services as before - amount of the mix they can purchase increased only 9x, not 90x you'd get by weighted average of original consumption levels (and they spend 92% of their purchasing power on services now). The least scalable factor - whichever it is - will be the bottleneck.

If we're unhappy with GDP there are alternative measures like HDI, but they're highly artificial. It would be very easy to construct completely different measures which would "feel" about as right.

Fortunately there exists a very natural measure of welfare, which I haven't seen used before in this context - preference utilitarian lotteries. Would you rather live in 1700, or take a 50% chance of living in 2010 or 700? Make a list of such bets, assign numbers coherent with bet values (with 100 for highest and 0 for your lowest value) and you're done! By averaging many people's estimates we can hopefully reduce the noise, and get some pretty reasonable welfare estimates.

And now disclaimer time. This approach has countless problems, here are just a few but I'm sure you can think about more.

I tried to think about such series of bets and my results are:

This seems far more reasonable than GDP's illusion of exponentially accelerating progress.

I used this Ruby code to convert bets to values on scale of 0 to 100 (bets ordered by preference, not chronologically):

def linearize_ratios(*ratios)
  diffs = ratios.inject([1.0]){|d,r| d + [d[-1] * r / (1-r)]}
  scale = diffs.inject{|a,b|a+b}
  diffs.inject([100]){|v,d| v + [v[-1] - 100.0 * d / scale]}
end
p linearize_ratios(0.7, 0.8, 0.6, 0.2, 0.4, 0.25, 0.2, 0.1, 0.9, 0.9, 0.25)

25 comments

Comments sorted by top scores.

comment by Airedale · 2010-04-15T16:36:36.435Z · LW(p) · GW(p)

I have a question about how this would work that is related to the both the values and representative sampling issues you raise. Would I or would I not adjust my gender (female) in placing my bets? Would I assume that I would be a woman of around median status and income when compared to other women of the era, or when compared to the overall population? It practically goes without saying that in many of these eras, most women had low status compared to men. Even if the more important factor in determining your position would be income, many women in many of these eras would only have income/property as a member of a household and in relation to men.

Somewhat relatedly, would someone of African descent assume he or she would be of the same ethnicity if rating, for example, the antebellum South of the United States (or many other eras/locales in the U.S.?)

It seems to me that in many cases the values problem you identify isn't just about what one would find morally repulsive even if one's physical welfare was not so bad. In many eras and locales, a women or some particular ethnic minority would have a very different lot in life in a way that directly affects his or her welfare (e.g., as a woman, not having any property rights and thus lacking security and independence; as an ethnic minority, being oppressed or even a slave).

Should everyone making these ratings take that sort of thing into account (that is, the possibility of ending up being a woman or an oppressed ethnic minority in the new era), or only raters who are female or of an ethnic minority? Did you take this into account in making your own ratings? It seems like that sort of thing could greatly affect how one rated different eras.

Replies from: taw
comment by taw · 2010-04-16T21:47:00.591Z · LW(p) · GW(p)

I did the standard historical thing and looked at median adult males. Average is usually worse than median.

This is fair, because I'm comparing only Western Europe for the last few centuries, and divisions now are mostly geographical - situation of different people in the same country tends to be similar; but situation of different people in different countries varies drastically. It used to be the other way - and correctly only one way would be rather unfair.

As for women, I'm definitely not going to compare that, as situation of large-family stay-at-home housewife (a model which goes back at least to Ancient Greece) is simply completely different way of life than what 20th century women do. This kind of separation of gender roles is incompatible with modern economy, and modern separation of gender roles is incompatible with most historical economies. I don't think that assigning low value to such life is based on much more than prejudice, but for this discussion look at clusterfuck which spread out of Bryan Caplan's article about 19th century women's freedoms to half of the blogosphere by now.

comment by Unnamed · 2010-04-14T15:19:31.506Z · LW(p) · GW(p)

There's evidence that happiness is proportional to the log of income (blog post, pdf article) That suggests that GDP per capita is a decent measure of quality of life, we just shouldn't treat it as a linear relationship. Exponentially increasing GDP translates into linearly increasing happiness.

Log income predicts happiness at a given point in time, not for a whole life, so I'd expect your method to produce ratings that are closely correlated with log income multiplied by life expectancy at age 20 (to match your decision to exclude child mortality). If we can find historical data on life expectancy and GDP per capita then we could test that prediction.

Replies from: taw
comment by taw · 2010-04-14T20:05:19.541Z · LW(p) · GW(p)

There's evidence that happiness is proportional to the log of income

I don't believe this is even a remotely possible result if interpreted absolutely, for if it was true, every single person born before 20th century would need to be suicidally depressed.

There's a reason I point to preference utilitarianism, not happiness utilitarianism.

Replies from: Unnamed, Jonii, Matt_Simpson
comment by Unnamed · 2010-04-15T02:03:42.147Z · LW(p) · GW(p)

That depends on the slope of the line and how far we are above the 0 utility threshold for a life worth living. I found this book with a table (table B21 on p. 264) of estimated historical GDP going back to the year 0, and there have been less than six doublings in that time. Current countries with per capita GDP at the same levels as pre-1900 Western Europe (and even year 0 Western Europe) are included in some of those analyses that found the log-linear fit, and their self-reported well-being is in line with the regression that fits the rest of the world. The log-linear relationship might break down if we go far enough into the past (or future), to places that are poor enough (or rich enough), or to societies that are different enough so that GDP won't be a good measure of their material quality of life, but the data I've seen suggest that the relationship is more robust than I would've expected.

The Stevenson & Wolfers paper that I linked uses self-report measures of welfar including happiness, satisfaction with life, and amount of smiling, and finds similar log-linear relationships with income on all of them (though with different slopes and intercepts), which suggests that this relationship will apply to whichever definition of utility we use.

comment by Jonii · 2010-04-15T15:40:26.019Z · LW(p) · GW(p)

Possibility of other factors affecting how happy people are wasn't excluded, as far as I can tell. Like social factors.

comment by Matt_Simpson · 2010-04-14T21:39:36.114Z · LW(p) · GW(p)

"measured happiness" isn't necessarily happiness in the sense you are thinking of. In fact, it probably isn't.

comment by cousin_it · 2010-04-14T15:01:53.238Z · LW(p) · GW(p)

Being transported from 2010 to 1700 isn't the same as being born in 1700.

Your formulation of the question sounds unintuitive to me. We could ask a simpler question: would you rather live one life starting 2010, or two lives starting 1700?

Also you could try applying your technique to comparing the welfare of different countries right now. Many of the problems you listed will be easier to overcome.

Replies from: JGWeissman
comment by JGWeissman · 2010-04-14T18:35:57.243Z · LW(p) · GW(p)

Also you could try applying your technique to comparing the welfare of different countries right now. Many of the problems you listed will be easier to overcome.

This also has the advantage that we actually know how to transport someone to another country. If someone wanted to put resources into this, they could ask people to actually make the choice, not just imagine what choice they would make.

Replies from: taw
comment by taw · 2010-04-14T21:35:43.837Z · LW(p) · GW(p)

Unless you speak that country's language natively, have social network there etc. this is just at all comparable in real world. It would still be just a thought experiment.

Replies from: JGWeissman
comment by JGWeissman · 2010-04-14T21:44:57.504Z · LW(p) · GW(p)

You are right, those are confounding factors.

Though if you look at willingness of people to take the bets moving in both directions, you may be able to account for it. For example (ignoring bilinguals for simplicity), if in England most people don't speak French, so they are less willing to move to France, and in France most people don't speak english, so they are less willing to move to England, maybe the effect cancels out, if both are equally represented.

Though, the experiment seems overly elaborate. We can look at immigration rates, and costs of immigration people are willing pay.

comment by prase · 2010-04-15T15:28:22.274Z · LW(p) · GW(p)

Neolithic Middle East (5000 BCE) - 1.6, Paleolithic anywhere (20000 BCE) - 0

Why do you prefer neolithic Middle East to the paleolithic? The advent of agriculture probably decreased the life expectancy, and I don't see anything which could compensate it in early agricultural societies. Put another way, I would strongly prefer a dangerous, but at least a bit adventurous life of a hunter-gatherer to the slavish work of a primitive peasant.

Replies from: taw
comment by taw · 2010-04-16T22:20:32.966Z · LW(p) · GW(p)

Agriculture, city life, and large scale trade networks arose together, and I prefer this higher population and culture density and lower risk of violent death to Paleolithic somewhat higher quality of food.

Replies from: Academian
comment by Academian · 2010-04-18T22:32:38.326Z · LW(p) · GW(p)

Hey taw, did you write a code to present yourself with lotteries, log your choices, and compute your utilities?

Replies from: taw
comment by taw · 2010-04-19T11:00:26.968Z · LW(p) · GW(p)

I ordered the choices until I was happy about them (that part wasn't too difficult as they're mostly chronological). Then I did "if I was about to live in nth, and there was a time machine that with p% moved me to n+1th, and otherwise to n-1th era, what would p need to be for me to take it" kind of lottery thinking.

Conversion of these to utilities was done by Ruby code in the post.

comment by Tehom · 2010-04-14T21:41:07.087Z · LW(p) · GW(p)

This was proposed as an alternative to GDP, but it's not clear that it actually measures something similar. Even broadly understanding both as attempts to measure human happiness, it doesn't seem similar.

Since we have no access to time-machines, we cannot give anyone a real choice between travelling back to 1700 and staying in 2010. There are no actual consequences to what they choose. So we are not even measuring people's naive preferences, we are just measuring what they like to say or believe about 1700 vs 2010.

comment by cupholder · 2010-04-14T20:07:19.152Z · LW(p) · GW(p)

If we're unhappy with GDP there are alternative measures like HDI, but they're highly artificial. It would be very easy to construct completely different measures which would "feel" about as right.

Running with this side point: I wonder if it would be possible to invent a less arbitrary HDI by feeding the variables the HDI uses (life expectancy, literacy rate, educational enrolment, and log GDP/capita) into a PCA and using the first principal component as an HDI replacement. That'd be less arbitrary than the current average of arbitrary indices method.

Replies from: taw
comment by taw · 2010-04-14T21:21:46.201Z · LW(p) · GW(p)

PCA components cluster correlated input variables, with component weights essentially proportional to number of inputs corresponding to it. If you put 10 health indicators, 2 economy indicators, and 2 education indicators - your principal component will be health-based. If you put 10 education indicators, 2 economy, 2 health, your principal component will be education-based etc. In no case will it be meaningfully "welfare".

That's how you get 5-factor models in psychology - you just know what kind of questions to put on the questionnaire, and as long as you don't stray too far from it, you'll get exactly the 5 factors you want.

PCA can only be insightful if all inputs are equally important - something that people using PCA rarely bother sanity-checking.

Replies from: rhollerith_dot_com, cupholder
comment by RHollerith (rhollerith_dot_com) · 2010-04-14T22:26:16.890Z · LW(p) · GW(p)

Thanks for this comment, taw. I'd been wondering whether PCA is solid evidence that the Big Five personality traits carve reality at the joints.

Replies from: Unnamed
comment by Unnamed · 2010-04-15T01:47:04.355Z · LW(p) · GW(p)

The Big Five personality model was originally developed by researchers who raided dictionaries for every personality trait term that they could find, had people rate themselves (or others) on hundreds or even thousands of them, and kept finding this five factor solution that explained a lot of variance. Studies in other languages and cultures typically find similar results, although it doesn't always replicate perfectly (e.g., a missing factor, an extra factor or two, a slightly different meaning for one factor). In some ways it reflects people's lay theories of personality more strongly than actual personality, so it might share some widespread blind spots or misconceptions, but it was constructed in a thorough, systematic way (and there is evidence that each factor predicts behaviors, so it can't be too wildly off).

comment by cupholder · 2010-04-14T22:34:42.651Z · LW(p) · GW(p)

Good point, thanks.

comment by MichaelVassar · 2010-04-14T18:37:51.938Z · LW(p) · GW(p)

This makes sense to me, but my ratings would be very different from yours. Also, is your rating for Western Europe 1900 colored by hindsight of two world wars, viral encephalitis, spanish flu and the rise to domination of the bureaucratic state? How clear are you being about socio-economic class? Are we just assuming the population distributions that existed? If so, ancient world slavery might make it less appealing than the Paleolithic I think.
As noted, time travel is problematic, but in what sense could a Paleolithic person 'be' me.

Replies from: taw
comment by taw · 2010-04-14T21:34:06.307Z · LW(p) · GW(p)

Also, is your rating for Western Europe 1900 colored by hindsight of two world wars, viral encephalitis, spanish flu and the rise to domination of the bureaucratic state?

Definitely not, I estimate that in most parallel universes such things didn't happen. They're very low likelihood was very strong expert consensus of the time, and we really don't have any new knowledge leading us to believe that they were likely.

Replies from: Academian
comment by Academian · 2010-04-18T22:26:23.816Z · LW(p) · GW(p)

Well they're at least more likely then our priors for them... they happened. Even with only a tiny prior that a coin is heads-biased, it landing heads is evidence for it.

Replies from: taw
comment by taw · 2010-04-19T10:55:28.896Z · LW(p) · GW(p)

You're privileging a hypothesis of events that happened. It was never 50% current world:50% something else - then add the fact that current world happened, and we're over 50% line.

Plenty of things which have happened had negligibly low probabilities.