How subjective is attractiveness?

post by JonahSinick · 2015-01-13T00:04:00.237Z · score: 25 (27 votes) · LW · GW · Legacy · 38 comments


  Gender differences
  Hierarchical modeling
  The distribution of ratings for a fixed person
  To be continued...

Consider the two statements:

Most people would agree that there's some truth to each of these statements. At Thing of Things Ozy wrote:

As for the beauty thing… well, yeah, everyone’s beautiful in the sense that everyone is sexually attractive to someone, and that human bodies in general are pretty cool-looking. But conventional attractiveness is still a thing. While I’m fairly conventionally attractive (thin, white, clear skin, symmetrical features), I doubt hairy legs, bound chests, and haircuts that make one look like a teenage boy are going to be all the rage at Cosmo any time soon.

This post explores the question of the extent to which each of the two statements is true, using data from a study of speed dating events conducted by Raymond Fisman and Sheena Iyengar. 

The basic facts  that I describe here are:

There's much more to say about how to interpret the group consensus and its implications, which I'll go into in a later post.

Each event involved ~15 men and ~15 women, and everybody of a given gender went on speed dates with everyone of opposite gender. Each participant on each date rated his or her partner on a number of dimensions, including attractiveness, on a scale from 1 to 10. For the purpose of this post, I focused on how attractive raters found a ratee relative to other ratees. For this reason, I scaled each rater's ratings so that the averages are the same for all raters of a given gender

Gender differences

One sees essentially the same phenomena when the raters are men and the ratees are women as one does when the genders are reversed. There is however one very important difference: the average of the ratings that men gave women was ~6.5, and the average of the ratings that women gave men was ~5.9. The standard deviations were the (interestingly) same in both cases, and in terms of standard deviations, women were rated 0.5 SD higher than men were. This fact may have profound ramifications. I've pictured the distributions of average attractiveness ratings of men and of women below:

The main difference between the distributions is that the one for women is shifted to the right relative to the one for men. The shapes of the distributions are also a little bit different, but one can verify that the difference within the range of what one would expect by chance.

Hierarchical modeling

We're interested in what the average ratings would be if a sufficiently large number of raters rated a given ratee.

The ratees who are rated highest and lowest are also the ratees whose ratings are most likely to be unrepresentative of the entire population's consensus on their attractiveness: there's regression to the mean.

A methodology that allows us to correct for this is Bayesian hierarchical modeling, which involves simultaneously estimating the "true" distribution of average attractiveness ratings of all hypothetical ratees together with the true average attractiveness ratings of the particular ratees in the dataset. The default assumption in Bayesian hierarchical modeling is that the true distribution is a normal distribution with mean and standard deviation to be determined. The histograms above suggest that this is close to being true in our setting.

If we use Bayesian hierarchical modeling to generate refined estimates for the averages, we get distributions that look something like the following:

Note that the in contrast with the actual averages, the refined estimates are never below 4.5 or above 8 –  the participants weren't rated by enough people for us to be confident that any participant is that far away from average.

The standard deviations of the distributions are nearly identical: 0.6 points on the 10 point scale.

The distribution of ratings for a fixed person

The image below shows the ratings of 18 women by 17 men.


One sees that with the exception of the ratees in columns 10 and 16, all ratees had at least one rater who perceived her attractiveness to be noticeably above average and at least one rater who perceived her attractiveness to be noticeably below average

The graph below shows the median rating (black), maximum rating (red) and minimum rating (blue) for all ratees in the study, together with best fit curves:

Here too, one sees that there are very few people who are consistently rated as being above average or below average.

This is consistent with the fact that the fact that the standard deviation of the ratings that an individual was given was roughly the same as the standard deviation of average ratings of the population of ratees. I've plotted the standard deviations for individual ratees below:

We see that the standard deviations have a strong central tendency, with mean equal to ~0.7 points.

The average standard deviation being 0.7 points overstates the variability in perceptions of an individual's attractiveness. Some reasons for this are:



In order to estimate the true standard deviation of the distribution of perceptions of a given person's attractiveness, I examined the relative predictive power of:

(i) Our refined estimate of the group consensus on ratees' attractiveness

(ii) The extent to which a rater's rating deviates from this estimate

in the context of predicting a rater's decisions as to whether or not to see a ratee again.

I found that 60% of the predictive power comes from the group consensus and 40% of the predictive power comes from deviations from the group consensus, suggesting that the standard deviation of variation in perceptions of a ratee's attractiveness is about 2/3 that of the standard deviation of the group consensus across ratees. In terms of points on a 10 point scale, this is about 0.45 points.

To be continued...

In subsequent posts, I'll describe how the data bears on the following questions:




Comments sorted by top scores.

comment by Pablo_Stafforini · 2015-01-13T10:09:05.889Z · score: 7 (7 votes) · LW · GW

Interesting post!

Christian Rudder from OkTrends (OkCupid's blog) found that the shape of the distribution of male attractiveness ratings varied significantly across female ratees. Did you observe a similar phenomenon?

comment by Jacobian · 2015-01-14T05:45:04.888Z · score: 3 (3 votes) · LW · GW

Let me be an Excel sidekick among statistical analysis heroes.

I saw the OKCupid stuff as well, I ran a quick test in Excel to see if the variance in attractiveness contributes to the decision to meet beyond the attractiveness mean. Here's what I got doing regression, with apologies for the hideous formatting:

                  ......... Coefficients    ..Standard Error     ..t Stat        ..P-value

 Intercept    -0.569931558    0.042946471    -13.27074239    4.65749E-35

 avg_attr    0.156634411    0.005238302    29.90175402    2.6299E-117

 attr_std    0.028596624    0.012485497    2.290387431    0.022377128

The dependent variable is match percent (percent of people who decided they want to date the ratee), avg attr is the mean and attr std the standard deviation of the physical attractiveness ratings. attr std is not the attractiveness to STDs ;-)

As we can see, the coefficient for attractiveness deviation is significantishly positive. It actually has a small negative correlation with match and a larger negative correlation with attractiveness. This means that there is more consensus on the attractiveness of prettier people. Holding attractiveness constant, variance, which is visible for a single rater as an "unusual look", increases the chances that people will want to date you. Put some flowers in your hair!

comment by JonahSinick · 2015-01-13T22:11:43.828Z · score: 2 (2 votes) · LW · GW

Thanks :-).

I haven't looked at how the shapes of the distributions vary yet. The variability in standard deviations seems consistent with the phenomenon described in the OkCupid blog post, but I don't whether the high variance distributions tend to be bell curve shaped with larger standard deviations or bimodal.

What is true is that there was no statistically significant secondary dimension of attractiveness. One would find other dimensions if there were a sufficiently large number of people at the events, but it's unclear how large "sufficiently large" is – it could be 10 more people, or it could be 100 more people. I'll be writing more about this later.

Undoubtedly, the homogeneity of the population studied also plays a role: if a woman with this facial adornment were at the event, and the event included some men from her culture, perceptions of her attractiveness would be extremely polarized.

comment by someonewrongonthenet · 2015-01-14T02:51:35.051Z · score: 0 (0 votes) · LW · GW

Undoubtedly, the homogeneity of the population studied also plays a role: if a woman with this facial adornment were at the event, and the event included some men from her culture, perceptions of her attractiveness would be extremely polarized.

I hear this a lot, and the Mursi always used as an example.

I don't think attraction is that malleable. Personally, I suspect that as a culture the Mursi simply don't prioritize beauty. They have marriages which are arranged as children, with cattle as a medium of exchange. They probably don't think about sexuality the same way at all.

comment by JonahSinick · 2015-01-14T05:51:31.544Z · score: 0 (0 votes) · LW · GW

I don't think attraction is that malleable.

Certainly my intuition based on day to day experience and observations is the same, and alleged very large cultural differences are puzzling to me and I wouldn't be surprised if the matter were resolved in your favor.

But note that if nothing else, the example shows that cultures vary in what they consider to be obviously unattractive.

comment by someonewrongonthenet · 2015-01-16T07:46:19.074Z · score: 3 (3 votes) · LW · GW

But note that if nothing else, the example shows that cultures vary in what they consider to be obviously unattractive.

That's not necessarily true - we don't need to look too far from our own culture to see intentional downplaying of attractiveness (modesty, "evil eye", etc)

I was gonna write more on this topic but then decided to just go and check what the anthropologists report the Mursi themselves say concerning lip plates:

Even after reading, it's still not quite clear.

My interpretation of this is that it's less about sheer beauty and more a way of being what in our culture we'd call "put together". A woman who does not wear her plate during the ritual periods when being put together is necessary is perceived as what we'd call "sloppy", and it would be associated with a lack of discipline in other areas of life. (It's also a tribal identity marker and a way to make money from tourists, of course)

The plate definitely maps onto something which is present in our own psychology, but I'm not at all convinced that it's attractiveness. I think you're right that it doesn't actively ruin attractiveness, which does indicate malleability.

(By the way, historically the anthropologists first thought it was an anti-rape measure (but the Mursi denied it), and then they thought it was a beauty mark which determined higher dowry (until it was discovered that dowry was set at birth). That second misconception is probably why it circulates on the internet as an example of divergent beauty standards. That's not to say that it isn't divergent beauty standards, but it's not just that.)

comment by gjm · 2015-01-14T11:26:53.672Z · score: 1 (1 votes) · LW · GW

Day-to-day experience and observation can give you evidence that attraction isn't very malleable in adults. Do your experience and observation tell you anything about whether, e.g., who you end up being attracted to depends strongly on the people you see around you before age 5?

comment by G-Max · 2015-01-14T12:35:53.653Z · score: -4 (4 votes) · LW · GW

Who in the nine circles of Hell would give the girl on the right a "1"? There's some dishonest rating going on here.

comment by Nornagest · 2015-01-14T17:48:54.649Z · score: 5 (5 votes) · LW · GW

People who don't like the subcultural signals she's throwing out.

The OKCupid ratings aren't supposed to be some kind of objective measure of beauty; they're supposed to capture the rater's subjective impression of how much they'd like to get to know the person in the photo. That means they end up depending on a lot of things other than raw physical attractiveness.

A while back, lukeprog wrote on low-variance and high-variance strategies in the dating market. His examples were a guy in business casual and a guy in full goth regalia, but something similar's going on here.

comment by G-Max · 2015-01-14T20:09:51.890Z · score: -4 (4 votes) · LW · GW

First of all, "subcultural signals she's throwing out"? What the hell? She's not throwing out subcultural anything.

Second, that's not how OKcupid works. Member's don't rate each other's overall profiles. They rate individual pictures.

Third... holy crap, "full goth regalia" is an actual phrase used by people other than me? It's the exact same one that I made up for myself to refer to my outfit! Small world, eh?

comment by Nornagest · 2015-01-14T21:10:53.229Z · score: 3 (3 votes) · LW · GW

I haven't used OKCupid in a couple of years, but when I did, there were two paths to giving someone a star rating. You could look at their regular profile, usually including several pictures and a couple screens of text, and click a control at the top to rate it; or you could enter a quick matching system that'd show you up to three pictures and an abbreviated version of their profile text. (There were a lot of jokes about how no one reads the text, but I got the impression that most people at least skimmed it.) There was also a "My Best Face" feature that did look at individual pictures, but that used an up/down rating system rather than the star ratings, and context here suggests that we're talking about profiles. Not that any of that matters here, since everything I said in the grandparent depends only on the photos.

If you can't see subcultural signaling in the picture on the right, I don't know what to tell you. She's fairly clearly urban rather than rural, and at least middle-class but probably not upper-; she's communicating a specific type of sexuality; and she's likely into the alternative fashion scene in some way; there are other things she's saying but those are the most obvious ones. You could call it "hipster", but that's less kind and more general than what I have in mind. The picture on the left is far less culturally marked, although I could probably venture a couple of good guesses.

(Not my downvote, by the way.)

comment by G-Max · 2015-01-14T23:43:26.510Z · score: -4 (8 votes) · LW · GW

She has a white flower in her hair, and there's a brick wall behind her. There's absolutely NOTHING about either of these things to suggest whether she is urban or rural, nor what her income level is, nor anything remotely sexual. The ear bling (are those supposed to be skulls?) is unusual, but is no more indicative of being a "hipster" than it is of being a goth, or maybe it's something that her best friend made for her at summer camp ten years ago and she still wears it because said friend died in a car accident. We have no bloody idea whatsoever.

Silly neurotypicals... always overestimating their own mind-reading abilities :/

I'm now quite interested in posting some pics of myself and seeing what ridiculous conclusions you draw from them. Are you game?

comment by Kindly · 2015-01-15T02:50:04.382Z · score: 4 (4 votes) · LW · GW

Whether or not the subcultural signals are there or not, the only thing that matters if you want an explanation of the "1"s is whether many people would think that the subcultural signals are there. And I think that we've established that enough people not only think so, but don't understand why you can't see them.

comment by G-Max · 2015-01-15T03:05:57.392Z · score: 0 (0 votes) · LW · GW

Alright, since you've given the only remotely rational response, I'll pass the ball on to you. Would you be interested in making guesses about me based on my own OKC pictures, and then learning how right or wrong your guesses are?

comment by Jiro · 2015-01-15T09:36:29.949Z · score: 3 (3 votes) · LW · GW

If you don't comprehend signals, it's quite possible that everyone else is signalling (and thus signalling can be determined from their pictures) but you are not (and thus it can't be determined from yours). So looking at your own pictures isn't going to demonstrate anything useful.

comment by Kindly · 2015-01-15T14:21:22.106Z · score: 0 (0 votes) · LW · GW

I would prefer not to.

comment by Nornagest · 2015-01-15T01:09:32.459Z · score: 4 (4 votes) · LW · GW

There's nothing there that can prove anything I've mentioned, but there's quite a bit to suggest it. Sure, it's theoretically possible that I could be totally wrong; signals like this give evidence, not hard data. But I'd still bet at long odds that I've got most of it right.

Particularly in this context. We're talking about pictures heading a dating site profile, not random photos dug out of someone's sock drawer; now, people do vary in their ability to control the impression they give, but within that scope people on a dating site are going to have clear goals that they'll tailor their profiles toward. Clothes, setting, body language, and photography all carry information that gets used to attract those they want to attract, and to deter those they don't. And very little about that photo looks accidental to me.

comment by G-Max · 2015-01-15T02:55:56.164Z · score: -5 (5 votes) · LW · GW



comment by CCC · 2015-01-15T10:16:56.941Z · score: 3 (3 votes) · LW · GW

There is quite a bit of information.

There is the earring; there is the facial expression, the pose, the bright, harsh artificial lighting, the flower in the hair.

Let me consider the earring. The earring suggests both a pierced ear and sufficient disposable income to spend on the earring (and on the original piercing itself). The earring is large enough that it is designed to be noticed, to be seen; which implies that it is there to carry a message. It would be troublesome worn near live animals, small babies, or certain types of industrial machinery (or anything else that is likely to grab and pull); implying that she considers it unlikely that she is going to run into any of those in the near future. It also implies that she expects it to be seen; that is, she expects there to be enough people (probably strangers) around to see it. This requirement is, of course, fulfilled by the fact that the photo is going on a dating website; however, few people will purchase an earring for merely a single occasion. It is likely that the earring was purchased with the anticipation that it would be worn on multiple occasions, which in turn implies that the wearer of the earring would be seen by many strangers on multiple occasions. This implies an urban, rather than a rural, earring-wearer, as urban people are seen by strangers more regularly. (It doesn't prove urbanity, but it's enough evidence to update in the direction of urbanity).

It's probable that someone else can tell more from the earring, but that's what I see in it. (I haven't mentioned what else I see in that picture, because I am deliberately concentrating on a single signal for demonstrative purposes).

comment by Nornagest · 2015-01-15T03:39:14.705Z · score: 1 (1 votes) · LW · GW

Do you believe that the print on a graphic T-shirt carries information? Or, say, smiling? Because that's what we're dealing with here: a set of generally understood markers that you can wear or signify through your behavior or its context. Just because you don't see or, presumably, participate in it doesn't mean that it can't be a viable channel for others.

I'm not especially keen on taking you up on your earlier offer, because it would be too easy to select unrepresentative photos. But imagine that some third party dug up... oh, let's say a hundred photos like these, of men of similar age and ethnicity. I'll bet at almost any odds that observers with typical neurology and relevant cultural experience could look at those photos and use them to gauge the subjects' social class, place of residence, musical taste, hobbies, and dozens of other things at rates better than chance. Much better, if they're good at it.

comment by Wes_W · 2015-01-15T08:02:23.582Z · score: 0 (0 votes) · LW · GW


That doesn't matter. Raters only have to think there are signals, and this subthread is an existence proof of such belief.

That said, the preponderance of "1" ratings AND the excessively high number of messages received suggests to me that there is in fact something weird going on. Dishonest rating would be one reasonable hypothesis, among several.

comment by Aiyen · 2015-01-15T03:13:24.545Z · score: 0 (0 votes) · LW · GW

It looks like a signal to me. Maybe we're misinterpreting, but if so, we have multiple people making the same mistake.

comment by G-Max · 2015-01-15T04:21:10.533Z · score: -1 (1 votes) · LW · GW

Given the politicians we have in office, I'd say that "multiple people making the same mistake" is a fairly common phenomenon :)

But please, explain exactly what information you think she's conveying and why you think that this is the most probable explanation for... whatever you think you're seeing.

comment by Vaniver · 2015-01-15T01:31:48.887Z · score: 1 (1 votes) · LW · GW

Silly neurotypicals... always overestimating their own mind-reading abilities :/

Yes, proper estimates look like this:

There's some dishonest rating going on here.

As for the signals, I will confirm: you are bad at reading these sorts of signals. (The most obvious one is the sexuality signal, which is determined by her pose leaning forward and the placement of her hand.)

comment by G-Max · 2015-01-15T02:45:40.632Z · score: -5 (5 votes) · LW · GW

Uh, no... there's nothing sexual about leaning toward a camera or putting a hand near your chin. Come on, I shouldn't have to explain this on a wiki devoted to rationality.

comment by CCC · 2015-01-15T10:29:11.499Z · score: 0 (0 votes) · LW · GW

Silly neurotypicals... always overestimating their own mind-reading abilities :/

Being neurotypical allows an interesting strategy; a neurotypical person can look at someone else, and ask themselves "under what circumstances would I, or someone like me, adopt that facial expression, adopt that posture, wear those clothes?" The answer to this question then becomes the first approximation for the answer to the question "what are that person's circumstances?"

Of course, it only works if the other person is also neurotypical; but, since most people are (hence the 'typical') that is usually a fairly minor downside. Using this on non-neurotypical people can lead to entertainingly wrong conclusions. (It also helps a lot if both people are from the same culture).

comment by Aiyen · 2015-01-15T03:11:11.259Z · score: 0 (0 votes) · LW · GW

Or you're typical-minding? I'd give her a 4, but that doesn't mean that anyone and everyone is going to feel the same way. In my experience at least, perceptions of attractiveness are higher varience than most other preferences-and "no accounting for taste" is a proverb for a reason.

comment by CronoDAS · 2015-01-13T13:42:54.104Z · score: 5 (5 votes) · LW · GW


Is Kindness Physically Attractive?

The tl;dr version: The halo effect of beauty also works in reverse: if you start liking a person, they'll start to appear more physically attractive, as well.

comment by JonahSinick · 2015-01-13T22:15:41.787Z · score: 1 (1 votes) · LW · GW

Thanks for pointing out the article. My findings are in tension with the ones that the article reports on: I have an instrumental variables type argument based on the data in the opposite direction (which I'll write about soon), and I think the the phenomena in the speed dating data is more likely to generalize because it was collected in a real world setting rather than a lab.

On the other hand, anecdotally, it seems like the situation may be very different if one looks at perceptions based on interactions that occur over longer time horizons than 4 minute speed dates.

comment by kpreid · 2015-01-13T15:41:39.216Z · score: 2 (2 votes) · LW · GW

FYI, you have a sentence missing its end: “The standard deviations of the distributions are nearly identical: 0.6 points on a ”.

Also, I think the "refined estimate" plot would be improved by setting its x-axis scale to be the same as the previous plot (even though this would create empty space).

comment by JonahSinick · 2015-01-13T21:29:01.872Z · score: 0 (0 votes) · LW · GW

Thanks, I fixed the sentence. I might change the plot as you suggested when I have time.

comment by Tenoke · 2015-01-13T14:27:20.567Z · score: 2 (6 votes) · LW · GW

*There is a universal standard for beauty.

*Beauty is in the eye of the beholder.

Just putting this out there - beauty is in fact completely subjective, and there is no universal standard nor can there be one, HOWEVER, it seems to us like beauty is objective because humans are really genetically (and socially) similar to each other. This gives rise to preferences that are shared by large groups, and the illusion that the things which many people consider attractive are objectively beautiful.

comment by JonahSinick · 2015-01-13T21:35:02.190Z · score: 2 (2 votes) · LW · GW

Yes, I meant "subjective" in a colloquial sense (the way people use it in day to day conversation) rather than a philosophical sense.

It seems possible to me that there are standards of beauty that would cut across many different species of intelligent life (including extraterrestrials) out of virtue of there being similar evolutionary pressures across contexts: for example, I could imagine aliens typically viewing aliens with symmetric features as being more attractive than aliens with asymmetric features. But yes, it's in principal possible for an entity's conceptions of beauty to be completely orthogonal to those of humans.

comment by gjm · 2015-01-14T11:23:30.285Z · score: 3 (3 votes) · LW · GW

Has anyone investigated this in non-human animals here on earth?

(... I realise that I have no idea how commonly, and how strongly, visual "attractiveness" is relevant to mating of non-human animals at all. Clearly at least sometimes it's at least quite relevant (consider, e.g., peacocks), but beyond that I'm pretty clueless. If you're reading this and know much more, please educate me!)

comment by G-Max · 2015-01-14T12:32:37.340Z · score: -4 (12 votes) · LW · GW

"Ratings on a 10 point scale are imprecise"

Wrong. If anything, it's TOO precise. Attractiveness is "fuzzy". If you asked me to rank Angelina Jolie on a 5-point scale, I'd give her a 4 without hesitation, but on a ten-point scale, I have no idea what she'd be. 7? 8? On a 5-point scale, 4 means "above average but not top-tier". On a 10-point scale, is there any meaningful difference between a 7 and an 8? It's like trying to decide whether a color is "maroon" or "crimson" when any sane person would just say "dark red".

On the other hand, everyone agrees that Nancy Pelosi is a 0 on any scale :)

comment by RichardKennaway · 2015-01-16T16:51:30.186Z · score: 4 (4 votes) · LW · GW

On the other hand, everyone agrees that Nancy Pelosi is a 0 on any scale :)

I had to look up who she is. Apart from anything else, she's 74 years old, so not a part of this. But looking at her on Google Images, I noticed that I could tell just by looking at the picture whether it linked to a pro-Pelosi or anti-Pelosi web page. I guess that for Americans, how attractive someone thinks Nancy Pelosi is correlates rather well with political affiliation.

She does have good teeth for her age.

comment by [deleted] · 2015-01-16T12:17:52.931Z · score: 3 (3 votes) · LW · GW

There's plenty of research on reliability of rating scales - and the sweet spot seems to be a range from 7-10 choices at least according to quite a few studies designed to address this directly. An influential paper in this regard is Preston & Coleman's (2000) "Optimal number of response categories in rating scales: reliability, validity, discriminating power, and respondent preferences." link to PDF


Using a self-administered questionnaire, 149 respondents rated service elements associated with a recently visited store or restaurant on scales that differed only in the number of response categories (ranging from 2 to 11) and on a 101-point scale presented in a different format. On several indices of reliability, validity, and discriminating power, the two-point, three-point, and four-point scales performed relatively poorly, and indices were significantly higher for scales with more response categories, up to about 7. Internal consistency did not differ significantly between scales, but test-retest reliability tended to decrease for scales with more than 10 response categories. Respondent preferences were highest for the 10-point scale, closely followed by the seven-point and nine-point scales.

Or if one prefers a more analytic approach, here's a 2012 conference proceedings paper by Kluver et al "How many bits per rating?" link to PDF

comment by Kawoomba · 2015-01-14T19:56:33.693Z · score: 1 (1 votes) · LW · GW

Don't knock it 'til you try it.