Which personality traits are real? Stress-testing the lexical hypothesis
post by tailcalled · 2023-06-21T19:46:03.164Z · LW · GW · 4 commentsContents
Easy-Goingness: An example Trait Impact as a measure of realness Factor model loss as a measure of conflation Correlation with lexical notion: naming things Summary Bonus: Going beyond the lexical hypothesis Appendix: Correlation matrices for all of the traits None 4 comments
This post is also available on my Substack. Thanks to Justis Mills for proofreading and feedback!
Most scientific personality models are, directly or indirectly[1], based on the lexical hypothesis, which roughly speaking states that there is a correspondence between important personality traits and abstract behavior-descriptive adjectives. For example, the Big Five was created by having people rate themselves using words like "outgoing", "hard-working" and "kind", and finding patterns in these. It is neat that one can create models in this way, but the large amount of abstraction involved by using abstract adjectives raises huge questions about how "real" the personality traits are.
I have created a new personality test, currently named Targeted Personality Test. I have multiple goals with this test, but one of them is to investigate which personality traits are “real”[2] without relying on the lexical hypothesis. I do this mainly by assessing lots of specific narrow behaviors, rather than abstract vague adjectives.[3]
By the end of this blog post, I hope to have introduced some concepts that makes my approach make sense, and thereby enable you to understand this diagram I made summarizing my results:
The semi-formal understanding of what is going on in this chart is very long, so before we proceed, let me give a brief, vague indication of what you will be informed about:
- Trait impact: a measure of how strongly the personality trait influences the various behaviors and thoughts that we would expect it to.
- Factor model loss: a measure of how much the personality trait conflates different unrelated things together.
- Correlation with lexical notion: a measure of how well-labelled the personality trait is. (You can mostly ignore this variable as all of the personality traits performed reasonably well on it.)
Easy-Goingness: An example
A conventional personality test such as the SPI-81-27&5 might measure your personality traits such as Easy-Goingness by asking you how well a few abstract statements describe you, e.g.:
- I like to take it easy
- I like a leisurely lifestyle
- I have a slow pace to my life
It seems plausible that someone who agrees that such statements describe them would be Easy-Going in some sense, and indeed I bet this sort of measure can pass all sorts of criteria used by psychologists to evaluate the quality of the test.[4]
So if it is probably valid by the standard criteria, what could go wrong that the standard criteria don’t test for?
Well, let’s imagine the sort of person who is Easy-Going. They probably tend to relax in their free time, e.g. watching TV, and they probably don’t get worked up about controversial stuff, and they probably don’t go above and beyond at work. Basically, a relaxed person who doesn’t get too stressed or excited about things.
When we call the person above “Easy-Going”, is this just a convenient label we use for someone who happens to have a constellation of traits like the above? Or are we saying that there is some underlying factor, like a motive to take it easy, which causes them to have these sorts of characteristics? Or maybe there are some underlying factors, but they are heterogeneous and “Easy-Goingness” lumps them together? These are the sorts of questions I tried to investigate.
My first step was to come up with a more concrete characterization of Easy-Goingness than abstract statements like “I like to take it easy” or “I have a slow pace to my life”. I did this by giving the SPI-81-27&5 test to a bunch of people, and then asking the people who score high and low in Easy-Goingness to describe an example of how they could be said to be Easy-Going. To give you a taste for the answers, one person who scored high in Easy-Goingness said:
When I finish work for the day I often go straight home and jump into my pyjamas. I like to relax and watch some tv and films to unwind after a long day - usually with a glass of wine. Certain days when I come home my partner would like to travel for a couple hours to go dog walking and enjoying time outside. No matter what kind of day I have at work I am always keen to do anything my partner/family/friends would like to do as is in my nature.
Meanwhile, a person who scored low in Easy-Goingness said:
A simple example is that when I arrive at work, my boss often asks at once if I want a coffee, as he often wants one at the beginning of the day. I prefer to do some work before having a coffee, as to me it signifies a moment of relaxation and to the puritan work ethic part of me, it doesn't make sense to have a break until I have "earned" it.
Based on 10 descriptions like this, I constructed some statements that would be reflective of Easy-Going people (+), or non-Easy-Going people (-):
- (+) In the evening I tend to relax and watch some videos/TV
- (+) I don’t feel the need to arrange any elaborate events to go to in my free time
- (+) I think it is best to take it easy about exams and interviews, rather than worrying a bunch about doing it right
- (+) I think you’ve got to have low expectations of others, as otherwise they will let you down
- (-) I get angry about politics
- (-) I have a stressful job
- (-) I don’t feel like I should have breaks at work unless I’ve “earned” them by finishing something productive
- (-) I spent a lot of effort on parenting
I included these statements in the Targeted Personality Test, as well as similar statements I designed for the 26 other personality traits from the test that I based this study on, and some additional statements that were useful for research purposes.
These statements are quite unlike the typical statements used in personality tests, because they are intentionally aiming to be much more narrow and concrete. This probably makes them less “efficient” in the sense that respondents will have to answer a lot more questions before we can get a detailed view of what they are like.
However, by being so concrete and narrow, it also allows us to more strongly test how real the traits are, e.g. whether it is just a coincidence that they sometimes co-occur to lead to “Easy-Going” people, and whether the statements conflate multiple unrelated traits together.
Trait Impact as a measure of realness
I have multiple measures of realness, but I think the most important measure of whether a personality trait is real is whether the associated behaviors do in fact correlate with each other, rather than them just sometimes coincidentally occurring together.
The simplest way to visualize whether this applies is with a correlation matrix, which is a diagram that shows how strongly a set of variables correlate with each other:
Here, each row and column represents a personality statement that I asked people to rate themselves on, and each cell represents the Pearson Correlation between the row variable and the column variable. While the exact correlations varied, overall the different concrete behaviors associated with Easy-Goingness had a correlation of about 0.06 with each other. If you are not familiar with Pearson Correlations, then here is a visualization of how weak 0.06 is:
This suggests to me that Easy-Goingness is not very “real”. While it might make sense to describe a person as doing something Easy-Going, for instance when they are watching TV, it is kind of arbitrary to talk about people as being more or less Easy-Going, because it depends a lot on context/what you mean.
If we take the square root of the extent to which two behaviors associated with the trait correlate with each other, we get the extent to which the behaviors correlate with the overall level of the trait. Doing this for Easy-Goingness, we get an effect of around 0.25. This is a somewhat stronger connection, but still quite weak:
The fact that this is weak means that even the most Easy-Going people cannot necessarily be expected to be particularly Easy-Going in all contexts. It is much more subtle than that.
The “Trait impact” axis in my original diagram in the start of the post shows this correlation for all of the different traits.
I picked Easy-Goingness as an example because it had the lowest “Trait impact”. It may also be informative to look at an example with a high “Trait impact”. The highest “Trait impact” was Art Appreciation, but it feels too narrow, so I am going to skip over it[5] and consider Conservatism as an example of a trait with a high “Trait impact”.
The correlation matrix for Conservatism looks like the following:
As you can see and probably expected, there are strong correlations between different Conservative/Progressive responses. Visualized as a scatterplot, it might look like this:
Of course this is still far from deterministic, but now it looks like we’ve got something fairly strong. Ideology seems more “real” than Easy-Goingness, in the sense measured by “Trait impact”.
Factor model loss as a measure of conflation
One of my other measures of personality trait realness was “Factor model loss”. What does that mean? Let’s take one of the traits that scored the worst in “Factor model loss”: Creativity.
If we look carefully, we can see that there are two distinct groups of items:
- Creative problem-solving: finding root causes for problems at work, copying old methods instead of coming up with new ones at work, being good at ideas during brainstorming
- Artistic creativity: creating decorations, quizzes/games/adventures/trips, being imaginative
Two of the items, involving creating visualizations and coming up with fictional stories, correlated with Creative problem-solving and Artistic creativity. Meanwhile math vs humanities didn’t really correlate with either.
Thus, it seems that the term “Creativity” is problematic as a personality trait, because it conflates Creative problem-solving with Artistic creativity, treating them as being the same thing when really they are basically unrelated.
To quantify the extent of the problem, I approximated what the correlation matrix would have to look like if there was no absolutely no conflation problem and there was only a single trait of Creativity which covered both Creative problem-solving and Artistic creativity. I got this result:
“Factor model loss” refers to the size of the difference between these two correlation matrices: the observed correlations versus the hypothetical correlations if there only was a single trait.
Correlation with lexical notion: naming things
The final notion of “realness” in my diagram was “Correlation with lexical notion”. What does that mean? Well, remember how I keep separating things into “concrete” and “abstract” descriptors?
I think of the “abstract” descriptors as being a measure of the informal common-sense version of the trait. You are probably easy-going if you think you are easy-going, conservative if you think you are conservative, and creative if you think you are creative. It may be a matter of definition to strictly know whether you fit, but it certainly seems like a good starting point.
But the fact that we can measure the common-sense notion of the trait separately from the behaviors associated with the trait raises the question: Do these measure the same thing? For instance, maybe there was a flaw in the way we collected examples of behaviors, so that they don’t correspond to what the trait is actually like.
I quantified this with “Correlation with lexical notion”. It is based on[6] the correlations between the abstract and the concrete questions.
However, it turns out that there is not much more to say about this, because all of the traits did great with respect to this; the “Correlation with lexical notion” was consistently close to 1, showing that the concrete and the abstract descriptors were getting at the same thing. (And when I inspected the ones who did the worst, it often seemed to be because of a technical form of noise that I am not going to get into.)
Summary
I have three different measures of the realness of a personality trait:
- Trait impact: how strongly the personality trait influences the various behaviors and thoughts that we would expect it to.
- Factor model loss: how much the personality trait conflates different unrelated things together.
- Correlation with lexical notion: how well-labelled the personality trait is.
Since the correlation with the lexical notion is consistently high, it appears that the personality traits have been assigned reasonably descriptive labels; however, some of the labels conflate multiple personality traits, such as:
- Creativity (appears to conflate Creative problem-solving and Artistic creativity)
- Charisma (appears to conflate Interpersonal Sensitivity and Social Ease)
- Emotional Stability (appears to conflate Problem-Handling Confidence and (opposite of) Catastrophizing)
- Conformity (appears to conflate Government Conformity, Aesthetic Conformity, and Religiosity)
- Authoritarianism (appears to conflate Political Authoritarianism and Law Adherence)
- Attention-Seeking (appears to conflate Benign Narcissism and (opposite of) Shyness)
To see more about what is getting conflated, skip to the appendix, where I show the correlation matrices for each of the traits.
But most importantly, a lot of traits are not that impactful. Examples of non-impactful traits include Easy-Goingness, Conformity, Irritability, Perfectionism, Sensation-Seeking, Trust, Compassion and Impulsivity. While people to some extent exhibited general, context-independent differences from each other in these traits, the differences were small relative to the context-dependent differences. So rather than seeing people as e.g. trusting or non-trusting in general, it may be much more productive to ask who they trust and who they don’t trust.
One thing I should warn about is, I think the trait impact for Orderliness could be overestimated, because I think in practice participants interpreted half of the questions as being about how tidy they kept their own home, which might be much narrower than general Orderliness.
Bonus: Going beyond the lexical hypothesis
Many of the 27 traits in the original test turned out to be problematic for my purposes.[7] You might think this shows my test to be irreparably flawed, but actually I had sort of hoped this would happen.
It is possible to use a statistical technique called factor analysis to identify patterns of correlations in empirical data. This is the technique that was used to create the original SPI test that I based my test on, and it is also the technique that has been used for many other psychometric tests.
Using factor analysis, I reshuffled the items from my test into 7 alternate factors, and made a version of the test that is less than a third of the length of the original one. In the future, I will likely write an in-depth description of how factor analysis works and which factors I have found.
This is almost certainly not the final form of the test. I have many plans for additional investigations I can perform as I get more data.
Appendix: Correlation matrices for all of the traits
- ^
The Big Five personality factors were originally derived by asking people to rate themselves on a large number of personality adjectives, and using statistics to find the biggest clusters of related descriptors. Other tests have been developed through other methods, many of which don’t primarily focus on abstract adjectives, though for reasons I won’t get into right now, I think they have a lot of dependence on the lexical hypothesis.
- ^
Of course, this is a subtle, complex question which depends on what exactly one means by “real”. I define the notion of “realness” I focus on later in the post, but other notions may be relevant for other purposes.
- ^
Because it is inherently difficult to measure behavior, I had to still rely on self-report surveys.
- ^
These are internal reliability, i.e. the sort of person who says “I like a leisurely lifestyle” is also more likely to say “I have a slow pace to my life”; test-retest reliability, i.e. the sort of person who says “I like a leisurely lifestyle” today will also tend to do so tomorrow, in a month, in a year, or in a decade; inter-rater validity, i.e. if a person says “I like a leisurely lifestyle” then their friends and family will also tend to say “They like a leisurely lifestyle”; criterion validity, i.e. the sort of person who says “I like a leisurely lifestyle” scores higher on some objective criterion of leisurely lifestyle such as amount of vacation days; and maybe also heritability, i.e. if one twin in a pair says “I like a leisurely lifestyle” then the other twin likely also says so too.
- ^
The narrower of a trait you are considering, the stronger the associated correlations would be. To see this, consider the absurd example where you are only considering a specific behavior, say watching TV. Any trait has a correlation of 1 with itself, so watching TV would have a Trait Impact of 1. It is only by abstracting over multiple different behaviors that Trait Impact can be nontrivial.
- ^
Since the different questions don’t correlate perfectly internally, e.g. “I enjoy cooking food for other people” and “I like to dance with people at parties” only correlate at 0.24, we can’t exactly expect abstract “sociability” to correlate perfectly with either. So I adjust for the reduction in correlation that would be expected from imperfect internal correlations.
- ^
Not necessarily for all purposes. Just because a trait is weak by my measures does not mean it cannot be relevant by other measures. Talk with personality researchers and read their papers if you want to find out what criteria they care about.
4 comments
Comments sorted by top scores.
comment by Unnamed · 2023-06-21T23:35:28.295Z · LW(p) · GW(p)
You may want to look into the crisis in personality psychology which was sparked by Walter Mischel's (1968) book "Personality and assessment". There were a lot of studies, and arguments between researchers, about questions like these.
Mischel's challenge: there are often low-seeming correlations between broad personality measures and specific behaviors.
Part of the response was that the correlations are much larger if you aggregate across many behaviors, e.g. instead of the correlation between an abstract rating of conscientiousness and how much a person engages in a single specific conscientious-related behavior, look at the correlation between an abstract rating of conscientiousness and the average of how much a person engages in 50 specific conscientiousness-related behaviors. Which suggests that there is some sort of broad trend that the person carries, even if any one behavior depends on a mix of things beyond that broad trend.
Mischel argued for also looking for narrower patterns that are more stable for a person rather than just these broad traits, e.g. a person might pretty consistently be talkative with their friends, even if they don't consistently engage in some other extraversion-related behaviors.
Replies from: tailcalled↑ comment by tailcalled · 2023-06-22T06:29:47.646Z · LW(p) · GW(p)
Sounds neat, I will have to take a look.
One thing to add is, one way you can interpret my "correlation with lexical notion" is as saying "what happens when we average infinitely many behaviors?". Since all the traits had a high "correlation with lexical notion", it seems I got the same result as the personality researchers.
comment by Daniel V · 2023-06-22T18:51:22.872Z · LW(p) · GW(p)
It's very interesting to see the intuitive approach here and there is a lot to like about how you identified something you didn't like in some personality tests (though there are some concrete ones out there), probed content domains for item generation, and settled upon correlations to assess hanging-togetherness.
But you need to incorporate your knowledge from reading about scale development and factor analysis. Obviously you've read in that space. You know you want to test item-total correlations (trait impact), multi-dimensionality (factor model loss), and criterion validity (correlation with lexical notion). Are you trying to ease us in with a primer (with different vocabulary!) or reinvent the wheel?
Let's start with the easy-goingness scale:
- (+) In the evening I tend to relax and watch some videos/TV
- (+) I don’t feel the need to arrange any elaborate events to go to in my free time
- (+) I think it is best to take it easy about exams and interviews, rather than worrying a bunch about doing it right
- (+) I think you’ve got to have low expectations of others, as otherwise they will let you down
- (-) I get angry about politics
- (-) I have a stressful job
- (-) I don’t feel like I should have breaks at work unless I’ve “earned” them by finishing something productive
- (-) I spent a lot of effort on parenting
The breadth of it is either a strength or a weakness. It'd be nice to have a construct definition or at least some gesturing at what easy-goingness actually is to gauge the face-validity of these items. Concrete items necessarily will have some domain-dependence, resulting in deficiency (e.g., someone who likes to relax and read a book will score low on item 1) or contamination (e.g., having low expectations of others might also be trait pessimism), but item 8 is really specific. It hampers the ability of this scale to capture easy-goingness among non-parents. The breadth would be good if it captured variations on easy-goingness, but instead it'd be bad if it just captures different things that don't really relate to each other. That's especially problematic because then the inference from low inter-correlations might not be that the construct is bad, but that the items just don't tap into it. You can see where I'm going with this because...
This suggests to me that Easy-Goingness is not very “real”. While it might make sense to describe a person as doing something Easy-Going, for instance when they are watching TV, it is kind of arbitrary to talk about people as being more or less Easy-Going, because it depends a lot on context/what you mean.
...indeed, the items are mainly just capturing different things, not reflecting on easy-goingness in any way. From a scale-assessment standpoint, it's great to see the results confirm my unease about the items based on simply reading them.
The fact that this is weak means that even the most Easy-Going people cannot necessarily be expected to be particularly Easy-Going in all contexts.
This statement presumes your measure reflects a higher-order easy-goingness and that context-specific easy-goingnesses are also being adequately measured.
With conservatism, on the other hand, you can see there is some context-specificity (e.g., dress vs. general social views vs. issue-based ideology), but the measure is facially better. And it hangs together better. Alternately, you might explore those contours and say you've come up with a multi-dimensional conservatism scale, just like you have a multi-dimensional creativity scale.
the “Correlation with lexical notion” was consistently close to 1, showing that the concrete and the abstract descriptors were getting at the same thing.
There's an implicit "when the concrete descriptors actually had face validity" hidden here; low correlation with the lexical notion could indicate a problem with the lexical scale or a problem with the concrete scale, or both.
Overall, I am very impressed that you presented a scary chart to start, promised you'd explain it, and successfully did so. The general takeaway from it is that the lexical hypothesis could be pretty sound and a few of these might be multidimensional in nature (or could be that some items are good and some a bad). For the low trait impact scales, it's a question of whether the items are good and the construct isn't "real," or whether the items are just a bad measurement approach.
Replies from: tailcalled↑ comment by tailcalled · 2023-06-22T21:10:43.002Z · LW(p) · GW(p)
Thank you for your in-depth response!
But you need to incorporate your knowledge from reading about scale development and factor analysis. Obviously you've read in that space. You know you want to test item-total correlations (trait impact), multi-dimensionality (factor model loss), and criterion validity (correlation with lexical notion). Are you trying to ease us in with a primer (with different vocabulary!) or reinvent the wheel?
Good question. In retrospect, I should probably have put more effort into using standard terms. That said:
- Test item-total correlations: Strictly speaking "factor loadings" would be a better term, since I did not compute it based on a correlation with a test score, but instead with a CFA-style factor model.
- Multidimensionality: Maybe. Obviously it's multidimensionality that I am trying to test, but literally my score for the tests is a least-squares loss for a CFA-style factor model.
- Criterion validity: Maybe. Arguably convergent/concurrent validity would be even more standard terms. But I think "Correlation with lexical notion" is more specific.
The breadth of it is either a strength or a weakness. It'd be nice to have a construct definition or at least some gesturing at what easy-goingness actually is to gauge the face-validity of these items.
The items are each meant to assess something from the stories I collected from someone who empirically scored high and low on easy-goingness scales. So their validity criterion is not meant to be in assessing easy-goingness generally, but in assessing the thing from those stories. Here are the stories corresponding to each item:
- In the evening I tend to relax and watch some videos/TV
When I finish work for the day I often go straight home and jump into my pyjamas. I like to relax and watch some tv and films to unwind after a long day - usually with a glass of wine. Certain days when I come home my partner would like to travel for a couple hours to go dog walking and enjoying time outside. No matter what kind of day I have at work I am always keen to do anything my partner/family/friends would like to do as is in my nature.
- (+) I don’t feel the need to arrange any elaborate events to go to in my free time
I think I dont need to always go out in evenings to feel socially connected. Rather I would sit and enjoy the quiet at home. Moreover I dont get easily flustered if people have different opinions compared to me. I dont easily get offended and can take things in a right spirit. so, I am easy to approach
- (+) I think it is best to take it easy about exams and interviews, rather than worrying a bunch about doing it right
I think youy have to be going in life otherwise everything will get to to. For example when i did my exams at uni and school, you have to be easy going to cope with the stress and fear thatr comes with them. This can be applied to anything though, if you are not easy going the littel things will get to you and you will have no chnace being able to cope with the big issues in life.
- (+) I think you’ve got to have low expectations of others, as otherwise they will let you down
I am easy going in that I do not have high expectations of others because I have learnt that people let you down and if there was no expectation in the first place you cannot be surprised or disappointed, on the other hand if you expect nothing you can be quite pleasantly surprised. I always try to see both sides of any argument or situation and consider that everyone has the right to an opinion that does not have to match my own.
- (-) I get angry about politics
I was in a team dinner party and in a discussion about politics which I joined in with other colleuages. There was a lot of talk about dealing with education, the economy and how to restore the leadership of the labour party back then and the Iraq war all of which I was onboard with . then came questions about what to do with flooding immigrants and how to control them, given my uncles both were illegal immigratns back then but managed to claw citizenship after 10 years I was uneasy joining the discussion and there was a lot of talk on what races were the culprits. I said only legal immigration should be allowed but did not join further focusing on my drink instead knowing the discussion was a race hate discussion and I was indirectly being attacked. Next 10 mins I made up an excuse to leave and left the party but faked goodbyes but was angry that I had to work with scumbag colleagues.
- (-) I have a stressful job
- (-) I spent a lot of effort on parenting
My life is not at all leisurely. I have two small children and a stressful job. If I were to be easy going about everything things wouldn’t get done and our lives would feel chaotic. There needs to be a balance between being easy going and highly strung. I don’t like to forget things that need doing or let people down.
- (-) I don’t feel like I should have breaks at work unless I’ve “earned” them by finishing something productive
A simple example is that when I arrive at work, my boss often asks at once if I want a coffee, as he often wants one at the beginning of the day. I prefer to do some work before having a coffee, as to me it signifies a moment of relaxation and to the puritan work ethic part of me, it doesn't make sense to have a break until I have "earned" it.
There's definitely a lot from these stories that I fail to capture. Often the participants mention multiple things and I only ask about one of them. I could easily imagine the items could be made better.
item 8 is really specific. It hampers the ability of this scale to capture easy-goingness among non-parents.
Maybe.
Really in the general population, most people are parents, so I don't think it is much more specific than the other items. But my respondents skew quite young, so it is probably a problem for my sample. Might be interesting to add an interaction model to this later though.
This statement presumes your measure reflects a higher-order easy-goingness and that context-specific easy-goingnesses are also being adequately measured.
With conservatism, on the other hand, you can see there is some context-specificity (e.g., dress vs. general social views vs. issue-based ideology), but the measure is facially better. And it hangs together better. Alternately, you might explore those contours and say you've come up with a multi-dimensional conservatism scale, just like you have a multi-dimensional creativity scale.
🤷 I constructed the conservatism and easy-goingness items in the same way, so I think there is something inherent to conservatism that makes it cohere more than easy-goingness.
There's an implicit "when the concrete descriptors actually had face validity" hidden here; low correlation with the lexical notion could indicate a problem with the lexical scale or a problem with the concrete scale, or both.
I think of it as an empirical test of the concrete descriptor's validity. That is, the abstract predictors have face validity, and if these are highly correlated with the concrete descriptors, then at least we know the concrete descriptors are not measuring anything other than the traits they are intended to measure.