How does personality vary across US cities?
post by JonahS (JonahSinick) · 2016-12-20T08:00:43.530Z · LW · GW · Legacy · 16 commentsContents
The Five Factor Model of Personality Data, methodology and high level results Extraversion Representative questions: Party cities have high average extraversion The Seattle Freeze Neuroticism Representative questions: Ethnicity as an underlying factor Agreeableness Representative questions: New Yorkers really are unusually disagreeable Conscientiousness Low conscientiousness in the Bay Area Connection with the in person rationalist community? Openness Artsy cities and openness Political Liberalism and Openness To Be Continued... None 16 comments
In 2007, psychology researchers Michal Kosinski and David Stillwell released a personality testing app on Facebook app called myPersonality. The app ended up being used by 4 million Facebook users, most of whom consented to their personality question answers and some information from their Facebook profiles to be used for research purposes.
The very large sample size and matching data from Facebook profiles make it possible to investigate many questions about personality differences that were previously inaccessible. Koskinski and Stillwell have used it in a number of interesting publications, which I highly recommend (e.g. [1], [2] [3]).
In this post, I focus on what the dataset tells us about how big five personality traits vary by geographic region in the United States.
The Five Factor Model of Personality
The Five Factor Model (FFM) or Big Five personality trait model is currently the dominant paradigm in personality research. The model is founded on the lexical hypothesis:
The lexical hypothesis is generally defined by two postulates. The first states that those personality characteristics that are most important in peoples' lives will eventually become a part of their language. The second follows from the first, stating that more important personality characteristics are more likely to be encoded into language as a single word.
When people are asked questions about whether various adjectives describe them (or describe someone who they know), their answers are pairwise correlated with one another. Applying factor analysis to the responses yields a small number of underlying factors that explain a large fraction of the variance common to the answers.
Empirically, it's been found that a model with 5 factors often fits the data well (though some researchers claim that one gets 6 or 7 factors if one uses a question battery that fully exhausts descriptive adjectives, see e.g. the HEXACO model of personality and The Big Seven Model of Personality and Its Relevance to Personality Pathology for more information).
The five factors referred to as the "Big Five" are labelled extraversion, neuroticism, agreeableness, conscientiousness and openness. I will describe these more below.
It's likely that the Big Five personality model falls far short of carving reality at its joints, and I'm in broad agreement with the stance that Jack Block expresses in A contrarian view of the five-factor approach to personality description. Nevertheless, the five factors in the model satisfy some desirable criteria, such as
- Longitudinal stability, with correlations between self-report and self-report 10 years later being ~0.7.
- Self-other agreement, with correlation of ~0.4 between self-report and a friend's perceptions, and ~0.5 between self-report and spouse perceptions).
- External validity, with correlations found between self-reported traits and objective behaviors.
- Heritability, with twin studies yielding estimates that 40%-60% of the variance in the underlying traits is explained by genetics.
- Cross-cultural validity, c.f. The Geographic Distribution of Big Five Personality Traits: Patterns and Profiles of Human Self-Description Across 56 Nations.
and much of the data that's available uses Big 5 personality questionnaires, so it's often what we have to work with.
Data, methodology and high level results
There were ~680k Americans who both answered 20+ questions on a Big Five Personality Test, and who made their hometown available to researchers. After excluding hometowns with <= 30 users, about ~3500 hometowns were represented. Questions were answered on a scale from 1 (strongly disagree) to 5 (strongly agree).
I estimated personality trait averages for each city using Bayesian hierarchical modeling in order to account for regression to the mean when sample sizes are small. This results in relatively large cities being more prominently represented at the extremes of the estimates, on account of the larger sample sizes making it possible to have greater confidence in city averages deviating substantially from the mean. A CSV file with all estimates of city averages is available on Dropbox.
The units in the graphs below are standard deviations away from the mean of the entire sample. Roughly speaking, average self-reported personality by city varies from -0.2 to 0.2 standard deviations from the mean. However, this likely understates the magnitudes of differences in underlying traits across cities, owing to people anchoring on the people who they know when answering the questions rather than anchoring on the national population, as described in Birds of a feather do flock together: behavior and language-based personality assessment reveal personality homophily among couples and friends:
Friends and spouses tend to be similar in a broad range of characteristics (i.e. homophily), such as age, educational level, attitudes, values, and general intelligence. Surprisingly, little evidence has been found for similarity in personality—one of the most fundamental psychological constructs. We argue that the lack of evidence for personality homophily derives from the tendency of individuals to make personality judgments in relation to a salient comparison group rather than in absolute terms when responding to the self-report and peer-report questionnaires commonly used in personality research (i.e. reference-group effect)
Extraversion
Representative questions:
- Do not mind being the centre of attention
- Make friends easily
- Keep in the background (reversed)
- Avoid contact with others (reversed)
Party cities have high average extraversion
The appearance of New Orleans, Miami, Hollywood, Beverly Hills and Newport Beach as amongst the highest on average extraversion is consistent with the the cities' reputations as having high prevalence of partying & socialization. New Orleans and Miami are both highest average extraversion in the data, and 2 of the 3 American cities on this list flist of top 20 party cities in the world.
The Seattle Freeze
Andrew J. Ho comments that the high frequency of cities in Washington state reminds him of the Seattle Freeze:
Newcomers to the area have described Seattleites as being standoffish, cold, distant, and not trusting.[3] While in settings such as bars and parties, people from Seattle tend to mainly interact with their particular clique.[4] One author described the aversion to strangers as: "people are very polite but not particularly friendly."[5] In 2008 a peer-reviewed study published in Perspectives on Psychological Science found that among all states, Washington residents ranked 48th in the personality trait extroverted.
Neuroticism
Representative questions:
- Often feel blue
- Get stressed out easily
- Feel comfortable with myself (reversed)
- Am not easily bothered by things (reversed)
Ethnicity as an underlying factor
Washington DC and Atlanta stand out as having unusually large African American populations, constituting roughly 50% of the population. From Wikipedia:
Atlanta has long been known as a center of black wealth, political power and culture; a cradle of the Civil Rights Movement[1] and home to Dr. Martin Luther King, Jr. It has often been called a "black mecca".
The researchers behind the myPersonality app labelled the Facebook profile photos of a subset of the users by their race, so we can stratify by race. The numbers of people for whom we have labelled photos are given below, by race.
The people in cities with low average neuroticism are heavily disproportionately African-American:
This is not a coincidence. In fact, for the sample as a whole, African Americans' self-reported neuroticism is a full 0.2 standard deviations lower than the rest of the population. This remains true even if we restrict attention to a particular city, like Washington DC:
The finding of African Americans being relatively low on neuroticism is consistent with the literature on national differences in personality. The figure below is from The Geographic Distribution of Big Five Personality Traits: Patterns and Profiles of Human Self-Description Across 56 Nations. It depicts estimates of average neuroticism by continent, showing that Africans are as a group noticeably lower in neuroticism than people from other continents.
Agreeableness
Representative questions:
- Believe that others have good intentions
- Am easy to satisfy
- Hold a grudge(reversed)
- Cut others to pieces (reversed)
Agreeableness and Mormonism?
Seven of the 10 cities with highest average agreeableness are in Utah. This corresponds to Utah residents being almost 60% Mormon: as a group, Mormons have exceptionally high average agreeableness. One can do an analysis similar to the one that I did with race and neuroticism. I'll return to this later in the context of a more systematic discussion of agreeableness and religion.
New Yorkers really are unusually disagreeable
The fact that 8 of the 10 cities listed correspond to some burrough of New York City is in accordance with stereotypes around New Yorkers being unfriendly / mean / aggressive / rude (c.f. New York City Ranked Sixth Most Unfriendly City in the World, Survey Finds).
Conscientiousness
Representative questions:
- Complete tasks successfully
- Am always prepared
- Need a push to get started (reversed)
- Shirk my duties (reversed)
Low conscientiousness in the Bay Area
It's striking that each of Berkeley, San Francisco, San Jose, Hayward and Cuptertino make the list of 10 cities with lowest average conscientiousness, while simultaneously all being in the Bay Area.
Connection with the in person rationalist community?
The finding that Bay Area residents skew toward unusually low conscientiousness should be of especially strong interest to the rationalist community in light of the fact that the Bay Area has become the central hub of community activity.
Slightly shifting the subject, in the 2016 Less Wrong Diaspora Survey, those respondents who reported to having involvement with the in-person community reported to being clincially diagnosed with ADHD with frequency ~20%, roughly 2x more frequently than those who reported to having no involvement with the in person community. Low conscientiousness is known to associate with ADHD, with people who have been diagnosed with ADHD scoring an average of 1 standard deviation below the population mean. In light of these things, it seems possible that there's some connection between high rates of clinical diagnosis of ADHD amongst people being involved with the in person community, and Bay Area residents being unusually low conscientiousness.
As with low extraversion, I'd welcome any ideas on what differentiates the cities with high average conscientiousness from others...
Openness
Representative questions.
- Have a vivid imagination
- Enjoy wild flights of fantasy
- Avoid philosophical discussions (reversed)
- Do not like poetry (reversed)
Artsy cities and openness
Openness is associated with artistic interests. Hollywood is the center of cinema in the United States. Sante Fe and New Orleans are considered two of the ten most artistic cities in America. So their appearance near the top of the list is in consonance with expectations.
Political Liberalism and Openness
Openness is known to be strongly predictive of liberal political affiliation (c.f. The Secret Lives of Liberals and Conservatives: Personality Profiles, Interaction Styles, and the Things They Leave Behind). So the appearance of many coastal California cities is also in consonance with expectations.
To Be Continued...
There's much more to say about personality and demographics, and I plan on writing more along these lines.
16 comments
Comments sorted by top scores.
comment by Douglas_Knight · 2016-12-20T17:31:59.816Z · LW(p) · GW(p)
this likely understates the magnitudes of differences in underlying traits across cities, owing to people anchoring on the people who they know when answering the questions
Sure, if there really are differences, this method probably underestimates them. But there are other phenomena that could create false differences, such as varying social desirability bias.
Replies from: JonahSinick↑ comment by JonahS (JonahSinick) · 2016-12-21T04:45:07.647Z · LW(p) · GW(p)
Yes, this is something that I've wondered about quite a bit specifically in connection with the variation in conscientiousness and agreeableness by religion. I plan on partially addressing this issue by discussing some objective behavioral proxies to the personality traits in later posts.
comment by Daniel_Burfoot · 2016-12-21T13:45:10.895Z · LW(p) · GW(p)
Five Factor Model (FFM) ... the model is founded on the lexical hypothesis:
I notice I am confused. I was sure that the FFM came out of doing the following simple procedure:
- Give people a many-item personality survey
- Do a PCA of the resulting data
- Keep the top 5 eigenvectors
- Label them with reasonably accurate adjectives that seem to describe the general drift of the vector
How wrong is this? How important is the "lexical hypothesis" part?
Replies from: Douglas_Knight↑ comment by Douglas_Knight · 2016-12-21T16:34:28.277Z · LW(p) · GW(p)
That's right. The lexical hypothesis only comes in at step 1 by including questions like "I am [adjective]." We start with a vague theory in the questionnaire and apply dimension reduction. The lexical hypothesis is that language gives us a vague theory. We want as broad a theory as possible, so it is useful to combine questionnaires. Some sources claim that the original questionnaire was generated from language without questions from explicit theories, but I don't think that's correct.
comment by Qiaochu_Yuan · 2016-12-20T21:08:15.121Z · LW(p) · GW(p)
Thanks for writing this! I really think people should be doing this (applying well-known algorithms to interesting datasets and seeing what happens) a lot more often overall, and it's on my list of skills I'd really like to learn personally. So I'd be interested to hear a little more info on methodology - what programming language(s) you used, how you generated the graphs, etc.
I'm pretty skeptical of making any connections to the Bay Area rationalist community based on Berkeley's conscientiousness score (which I think is interesting but not for this reason). There are 100,000 people living in Berkeley, and most of them aren't rationalists. And depending on how far back most of this data was collected, plausibly most of the Berkeley respondents were high school or college students (UC Berkeley alone has over 35,000 students), since for awhile that was the main demographic of Facebook users, and probably for awhile longer that was the main demographic of Facebook users willing to take personality tests. (Edit: But see Douglas_Knight's comment below.) In general I'd think more about selection effects like this before drawing any conclusions.
Replies from: JonahSinick, taygetea, Douglas_Knight↑ comment by JonahS (JonahSinick) · 2016-12-21T04:26:16.599Z · LW(p) · GW(p)
Glad you liked it :-).
So I'd be interested to hear a little more info on methodology - what programming language(s) you used, how you generated the graphs, etc.
I used R for this analysis. Some resources that you might find relevant:
- Practical Data Science with R has very nice introduction to exploratory data analysis.
- Advanced R goes into more detail on the language.
- The graphs were made using ggplot2.
- I used the lme4 package for Bayesian hierarchical modeling. See, e.g. Getting Started with Mixed Effect Models in R.
- Kaggle Kernels has some good sample scripts.
And depending on how far back most of this data was collected, plausibly most of the Berkeley respondents were high school or college students (UC Berkeley alone has over 35,000 students), since for awhile that was the main demographic of Facebook users, and probably for awhile longer that was the main demographic of Facebook users willing to take personality tests.
Douglas_Knight is correct – the average age of users is quite low, at ~26 years old both for the high conscientiousness cities and the low conscientiousness cities.
Replies from: Qiaochu_Yuan↑ comment by Qiaochu_Yuan · 2016-12-23T19:16:28.818Z · LW(p) · GW(p)
Thanks for the links!
↑ comment by taygetea · 2016-12-20T23:53:18.578Z · LW(p) · GW(p)
I think you have the causality flipped around. Jonah is suggesting that something about Berkeley contributes to the prevalence of low conscientiousness among rationalists.
Replies from: JonahSinick, John_Maxwell_IV↑ comment by JonahS (JonahSinick) · 2016-12-21T04:32:49.367Z · LW(p) · GW(p)
What I had in mind was that the apparent low average conscientiousness in the Bay Area might have been one of the cultural factors that drew rationalists who are involved in the in-person community to the location. But of course the interpretation that you raise is also a possibility.
Replies from: taygetea↑ comment by John_Maxwell (John_Maxwell_IV) · 2016-12-21T23:44:29.271Z · LW(p) · GW(p)
Previously on LW: Self control may be contagious
↑ comment by Douglas_Knight · 2016-12-20T22:11:57.880Z · LW(p) · GW(p)
Actually, two of your complaints cancel out. You should expect that the population living in Berkeley has a very young personality, but if all the data is from college students, then there's nothing special about Berkeley (except that it is large and thus small effects are statistically significant — but the claim is that it has a large effect).
I think you are correct that the data is all college students (or at least fairly young people). I believe this because the cities being discussed are the hometown, not the current residence, which is the kind of thing you'd do with college students. In any event, studying hometown controls for the age demographics of Berkeley. But Jonah should have explicitly controlled for age.
Added: poking around the website I don't see a clear answer to how old the data is. Most of it seems to have been collected by 2011, but I'm not sure because there are lots of variations. Each big5 score is labeled with the date taken.
Replies from: Qiaochu_Yuan↑ comment by Qiaochu_Yuan · 2016-12-22T18:41:16.784Z · LW(p) · GW(p)
I think you are correct that the data is all college students (or at least fairly young people). I believe this because the cities being discussed are the hometown, not the current residence, which is the kind of thing you'd do with college students. In any event, studying hometown controls for the age demographics of Berkeley. But Jonah should have explicitly controlled for age.
Good point, I missed this.
comment by devi · 2016-12-20T23:16:43.355Z · LW(p) · GW(p)
However, this likely understates the magnitudes of differences in underlying traits across cities, owing to people anchoring on the people who they know when answering the questions rather than anchoring on the national population
I think this is a major problem. This is mainly based on taking a brief look at this study a while back and being very suspicious of it explicitly contradicting so many of my models (eg South America having lower Extraversion than North America and East Asia being the least Conscientious region)