## Posts

Rugby & Regression Towards the Mean 2019-10-30T16:36:00.287Z · score: 16 (4 votes)
Age gaps and Birth order: Reanalysis 2019-09-07T19:33:16.174Z · score: 49 (10 votes)
Age gaps and Birth order: Failed reproduction of results 2019-09-07T19:22:55.068Z · score: 63 (16 votes)
What are principled ways for penalising complexity in practice? 2019-06-27T07:28:16.850Z · score: 42 (11 votes)
How is Solomonoff induction calculated in practice? 2019-06-04T10:11:37.310Z · score: 35 (7 votes)
Book review: My Hidden Chimp 2019-03-04T09:55:32.362Z · score: 31 (13 votes)
Who wants to be a Millionaire? 2019-02-01T14:02:52.794Z · score: 29 (16 votes)
Experiences of Self-deception 2018-12-18T11:10:26.965Z · score: 16 (5 votes)
Status model 2018-11-26T15:05:12.105Z · score: 29 (10 votes)
Bayes Questions 2018-11-07T16:54:38.800Z · score: 22 (4 votes)
Good Samaritans in experiments 2018-10-30T23:34:27.153Z · score: 130 (53 votes)
In praise of heuristics 2018-10-24T15:44:47.771Z · score: 44 (14 votes)
The tails coming apart as a strategy for success 2018-10-01T15:18:50.228Z · score: 33 (17 votes)
Defining by opposites 2018-09-18T09:26:38.579Z · score: 19 (10 votes)
Birth order effect found in Nobel Laureates in Physics 2018-09-04T12:17:53.269Z · score: 61 (19 votes)

Comment by bucky on Bayesian examination · 2019-12-11T12:21:34.489Z · score: 2 (1 votes) · LW · GW

I think the 100k drop analogy may be misleading when thinking about the final result. The final score in the version I envisage is judged on ratios between results, rather than absolute values (my explanation maybe isn't clearly enough on this). In that case putting everything on the answer which you have 60% confidence in and being right gives a ratio of 1.67 in your favour over an honest reporting. But if you do it and get it wrong then there is an infinite ratio in favour of the honest reporting.

Comment by bucky on Bayesian examination · 2019-12-11T07:21:28.964Z · score: 2 (1 votes) · LW · GW

This is true if scores from different questions are added but not if they are multiplied. Linear scoring with multiplication is exactly the same as log scoring with addition, just easier to visualise (at least to me)

Comment by bucky on Bayesian examination · 2019-12-10T22:39:00.672Z · score: 5 (3 votes) · LW · GW

Great post, Id be really interested to hear how this goes down with students.

I would be cautious about using information from incorrect answers to calculate the score - just using the percentage given for the correct answer still gives a proper scoring rule. If percentages placed on incorrect answers are included then you get 50:25:25:0 giving more points than 50:50:0:0 when the answer is A which I think people might find hard to swallow.

For a proper scoring rule I find a particular framing of a log score to be intuitive - instead of adding the logs of the probabilities placed on the correct answers, just multiply out the probabilities.

This can be visualised as having a heap of points and having to spread them all across the 4 possible answers. You lose the points that were placed on the wrong answers and then use your remaining points to repeat the process for the next question. Whoever has the most points left at the end has done the best. The £100k drop is a game show which is based on this premise.

I personally find this to be an easy visualisation with the added benefit that the scores have a specific Bayesian interpretation - the ratio of students’ scores represent the likelihood function of who knows the subject best based on the evidence of that exam.

Comment by bucky on Symbiotic Wars · 2019-12-04T09:57:39.818Z · score: 5 (3 votes) · LW · GW

This reminds me of the SSC post toxoplasma of rage (see especially section V).

Comment by bucky on Could someone please start a bright home lighting company? · 2019-12-04T09:41:02.998Z · score: 3 (2 votes) · LW · GW

I think this is likely to be orders of magnitude away from the kinds of things which have been effective for others (see e.g. this rough calculation on reddit)

Comment by bucky on How do you assess the quality / reliability of a scientific study? · 2019-12-03T13:52:40.337Z · score: 5 (3 votes) · LW · GW

Thanks Ruby.

Good summary of my answer; by the time I got round to writing mine there were so many good qualitative summaries I wanted to do something different. I think you’ve hit the nail on the head with the main weakness being difficulty in application, particularly in estimating Cohen’s d.

I am currently taking part in replication markets and basing my judgements mainly on experimental power. Hopefully this will give me a better idea of what works and I may write an updated guide next year.

As a data point r.e. the prize, I’m pretty sure that if the prize wasn’t there I would have done my usual and intended to write something and never actually got round to it. I think this kind of prize is particularly useful for questions which take a while to work on and attention would otherwise drift.

Comment by bucky on LessWrong anti-kibitzer (hides comment authors and vote counts) · 2019-11-28T18:52:41.094Z · score: 2 (1 votes) · LW · GW

i don’t think this is available on new lesswrong. It is available if you use greaterwrong.com. See here for instructions.

Comment by bucky on Conversational Cultures: Combat vs Nurture · 2019-11-27T19:16:40.631Z · score: 4 (2 votes) · LW · GW

I’ve referred back to this multiple times and it has helped (e.g. at work) to get people to understand each other better.

Comment by bucky on What's the largest sunk cost you let go? · 2019-11-24T11:21:18.523Z · score: 4 (3 votes) · LW · GW

I suspect a big one for many will be a religion where sunk costs can be in the thousands of hours

Comment by bucky on Do you get value out of contentless comments? · 2019-11-22T08:08:29.138Z · score: 16 (5 votes) · LW · GW

I do appreciate the little comments but for me there's a huge benefit for even a sentence of why they liked it.

For instance this comment definitely had a much larger positive effect on me than a strong upvote:

I really appreciate seeing this kind of applied statistical analysis to a stray interesting-sounding fact you heard.

I doubt this took much longer to write than "Good post!" but the extra time was definitely worth it to me.

Comment by bucky on Hard to find factors messing up experiments: Examples? · 2019-11-19T20:52:52.772Z · score: 2 (1 votes) · LW · GW

Ha,I think I remember seeing a CSI episode based on this story but never realised it was based on a true story. At the time I thought the plot was a tad implausible but that shows what I know!

Comment by bucky on How do you assess the quality / reliability of a scientific study? · 2019-11-18T16:27:01.588Z · score: 3 (2 votes) · LW · GW

1. For reasonable assumptions if you're studying an interaction then you might need 16x larger samples - see Gelman. Essentially standard error is double for interactions and Andrew thinks that interaction effects being half the size of main effects is a good starting point for estimates, giving times larger samples.

2. When estimating cohen's d, it is important that you know whether the study is between or within subjects - within subject studies will give much lower standard error and thus require much smaller samples. Again Gelman discusses.

Comment by bucky on How do you assess the quality / reliability of a scientific study? · 2019-11-18T12:18:17.497Z · score: 6 (4 votes) · LW · GW

I just came across an example of this which might be helpful.

Good grades and a desk 'key for university hopes' (BBC News)

Essentially getting good grades and having a desk in your room are apparently good predictors of whether you want to go to university or not. The former seemed sensible, the latter seemed like it shouldn't have a big effect size but I wanted to give it a chance.

The paper itself is here.

Just from the abstract you can tell there are at least 8 input variables so the numerator on Lehr's equation becomes ~26. This means a cohen's d of 0.1 (which I feel is pretty generous for having a desk in your room) would require 2600 results in each sample.

As the samples are unlikely to be of equal size, I would estimate they would need a total of ~10,000 samples for this to have any chance of finding a meaningful result for smaller effect sizes.

The actual number of samples was ~1,000. At this point I would normally write off the study without bothering to go deeper, the process taking less than 5 minutes.

I was curious to see how they managed to get multiple significant results despite the sample size limitations. It turns out that they decided against reporting p-values because "we could no longer assume randomness of the sample". Instead they report the odds ratio of each result and said that anything with a large ratio had an effect, ignoring any uncertainty of the results.

It turns out there were only 108 students in the no-desk sample. Definitely what Andrew Gelman calls a Kangaroo measurement.

There are a lot of other problems with the paper but just looking at the sample size (even though the sample size was ~1,000) was a helpful check to confidently reject the paper with minimal effort.

Comment by bucky on How do you assess the quality / reliability of a scientific study? · 2019-11-12T16:28:22.875Z · score: 26 (6 votes) · LW · GW

Often I want to form a quick impression as to whether it is worth me analysing a given paper in more detail. A couple of quick calculations can go a long way. Some of this will be obvious but I've tried to give the approximate thresholds for the results which up until now I've been using subconsciously. I'd be very interested to hear other people's thresholds.

## Calculations

• Calculate how many p-values (could) have been calculated.
• If the study and analysis techniques were pre-registered then count how many p-values were calculated.
• If the study was not pre-registered, calculate how many different p-values could have been calculated (had the data looked different) which would have been equally justified as the ones that they did calculate (see Gelman’s garden of forking paths). This depends on how aggressive any hacking has been but roughly speaking I'd calculate:
• Number of input variables (including interactions) x Number of measurement variables
• Calculate expected number of type I errors
• Multiply answer from previous step by the threshold p-value of the paper
• Different results may have different thresholds which makes life a little more complicated

• Estimate Cohen’s d for the experiment (without looking at the actual result!)
• One option in estimating effect size is to not consider the specific intervention, but just to estimate how easy the target variable is to move for any intervention – see putanumonit for a more detailed explanation. I wouldn't completely throw away my prior on how effective the particular intervention in question is, but I do consider it helpful advice to not let my prior act too powerfully.
• Calculate experimental power
• You can calculate this properly but alternatively can use Lehr’s formula. Sample size equations for different underlying distributions can be found here.
• To get Power > 0.8 we require sample size per group of:
• This is based on , single p-value calculated, 2 samples of equal size, 2 tailed t-test.
• A modification to this rule to account for multiple p-values would be to add 3.25 to the numerator for each doubling of the number of p-values calculated previously.
• If sample sizes are very unequal (ratio of >10) then the number required in the smaller sample is the above calculation divided by 2. This also works for single sample tests against a fixed value.

## Thresholds

Roughly speaking, if expected type I errors is above 0.25 I’ll write the study off, between 0.05 and 0.25 I’ll be suspicious. If multiple significant p-values are found this gets a bit tricky due to non-independence of the p-values so more investigation may be required.

If sample size is sufficient for power > 0.8 then I’m happy. If it comes out below then I’m suspicious and have to check whether my estimation for Cohen’s d is reasonable. If I'm still convinced N is a long way from being large enough I'll write the study off. Obviously as the paper has been published the calculated Cohen’s d is large enough to get a significant result but the question is do I believe that the effect size calculated is reasonable.

## Test

I tried Lehr’s formula on the 80,000 hours replication quiz. Of the 21 replications, my calculation gave a decisive answer in 17 papers, getting them all correct - 9 studies with comfortably oversized samples replicated successfully, 8 studies with massively undersized samples (less than half the required sample size I calculated) failed to replicate. Of the remaining 4 where the sample sizes were 0.5 – 1.2 x my estimate from Lehr’s equation, all successfully replicated.

(I remembered the answer to most of the replications but tried my hardest to ignore this when estimating Cohen's d.)

Just having a fixed minimum N wouldn’t have worked nearly as well – of the 5 smallest studies only 1 failed to replicate.

Comment by bucky on A new kind of Hermeneutics · 2019-10-30T20:40:46.465Z · score: 1 (2 votes) · LW · GW

Nitpick: Marx = 19th century?

Comment by bucky on Why are people so bad at dating? · 2019-10-29T10:00:16.288Z · score: 5 (3 votes) · LW · GW
The main thing that evolution optimized for is simply having a child, not for having a child with the most attractive possible person.

I think this undervalues the evolutionary importance of having an attractive partner (see sexual selection). If I have an attractive mate then my children are more attractive and in turn will have more opportunities to have children, significantly adding to my overall genetic fitness. This process can lead to spectacular results.

Introspecting purely on my base desires and not accounting for high level reasoning, I would trade ~3 chances to mate with a medium attractive person for one chance to mate with a highly attractive person. I wouldn't swap for people I find unattractive no matter how many. This suggests that, if I am typical of humanity, attractiveness of partner is actually more optimized for than simply having a child.

Comment by bucky on Does the US nuclear policy still target cities? · 2019-10-03T12:57:33.244Z · score: 3 (2 votes) · LW · GW

Being known to be vengeful may be the correct game-theoretic response in the absence of formal precommitment strategies.

I don't claim that Allied strategists were acting on game-theoretic considerations but that acting on a desire for vengeance means that one implements the response which one would have committed to if formal precommitment had been an option.

Comment by bucky on Don't depend on others to ask for explanations · 2019-09-19T08:39:34.127Z · score: 9 (5 votes) · LW · GW

A slight variation on this which I find a challenge is that when I start working on something the inferential distance between me and the target audience might not be that large. After I've spent a few hours/days/weeks thinking about something and researching it I might be a few inferential steps from where I started.

Going back and recreating those steps can be difficult unless I remember to note them down as I go.

Comment by bucky on Wolf's Dice · 2019-09-11T10:56:48.547Z · score: 2 (1 votes) · LW · GW
Note the symmetry factor with the factorials: we're computing the probability of the observed counts, not the probability of a particular string of outcomes, so we have to add up probabilities of all the outcomes with the same counts.

Can you clarify why we look at the probability of counts rather than the particular string?

The reason I'm asking is that if a problem has continuous outcomes instead of discrete then we automatically look at the string of outcomes instead of the count (unless we bin the results). Is this just a fundamental difference between continuous and discrete outcomes?

Comment by bucky on Age gaps and Birth order: Reanalysis · 2019-09-07T21:39:31.060Z · score: 1 (1 votes) · LW · GW

No worries, thanks for fixing my pictures!

Comment by bucky on Age gaps and Birth order: Reanalysis · 2019-09-07T19:11:57.385Z · score: 6 (4 votes) · LW · GW

This post was accidentally released a day early for a few hours before I moved it back into drafts. Apologies for any confusion.

Comment by bucky on Age gaps and Birth order: Failed reproduction of results · 2019-09-05T13:06:20.971Z · score: 20 (9 votes) · LW · GW

Fun fact: 7 survey respondents attempted to convert the number of minutes between them and their twin into a fraction of a year (e.g. 9.506E-06 years is 5 minutes). All 7 who did this were the older twin.

(I did include these people in the analysis above)

This provides evidence for the “Older twins care about being the oldest, younger twins don’t talk about it” hypothesis. I don’t think this will come as a massive surprise to anyone.

I understand that the price to swap birth order with your twin is a bowl of soup, although adjusting for 1% yearly inflation over 4000 years this now comes to 193 quadrillion bowls of soup.

Comment by bucky on Analysis of a Secret Hitler Scenario · 2019-08-23T11:36:17.646Z · score: 2 (2 votes) · LW · GW

Firstly, I really like this kind of thing and enjoyed you analysis.

One thing I think it misses out on Marek's choice of who to inspect.

Liberal!Marek chooses without knowledge of who is fascist and who is liberal so has a 50:50 chance of selecting a fascist or a liberal. So if he is a liberal there is a 50:50 chance of him selecting a fascist, outing them and getting into this argument. (I'm ignoring the possibility that Marek will just say nothing)

Fascist!Marek already knows who is fascist/liberal and looking at the party membership card is a charade for him. He has 4 options:

1. Choose liberal, claim liberal

2. Choose liberal, claim fascist

3. Choose fascist, claim fascist

4. Choose fascist, clam liberal

On the surface option 3 doesn't seem likely. Options 1 and 2 are the options investigated in the OP (but assuming liberal was chosen by chance). Option 4 also seems like it might be used.

If we set option 4 to 0% then Marek is guaranteed to choose a liberal and assume the 50:50 bold/timid split for 1&2 then fascist!Marek has a 50:50 chance of getting into this argument - the same as liberal!Marek so this provides no evidence either way.

If we say split the probabilities of option 1,2 and 4 in 25%:25%:50% then we return to the result in the OP. If option 4 is between 0 and 50% likely then the argument happening is somewhere between 0 and 1 bit of evidence in favour of Marek being liberal.

***

Of course fascist!Marek makes the choice between the 4 options in the knowledge that everyone already thinks he's probably a fascist (although he's probably not Hitler). This will effect his choice as he may be extra keen to send a signal that he isn't a fascist, so would ideally like to not accuse anyone in the knowledge that everyone will probably side with the person he accuses. He might choose option 1 as this will increase that person's trust in him and also cast doubt on that person in the mind of everyone else. Even option 3 might be appealing - it might harm Marek but it makes the person he accuses look very liberal.

But everyone knows that Marek is in this position and Marek knows that everyone knows so this begins to hurt my head and is also why this kind of game is amazing!

Harry, smiling, had asked Professor Quirrell what level he played at, and Professor Quirrell, also smiling, had responded, One level higher than you. - HPMor
Comment by bucky on Analysis of a Secret Hitler Scenario · 2019-08-23T10:00:22.844Z · score: 2 (2 votes) · LW · GW

The first mistake you mention is exactly the mistake I make when I don't convert to odds form as I mentioned here.

If I start with and him accusing gives me 1 bit of evidence (he's twice as likely to accuse if he's liberal) then the temptation is to split the uncertainty in half and update incorrectly to .

Odds form helps - 1:1 becomes 2:1 after 1 bit of evidence so .

More formally:

Comment by bucky on Odds are not easier · 2019-08-21T22:38:23.408Z · score: 5 (3 votes) · LW · GW

I find if I try using probabilities in Bayes in my head then I make mistakes. If I start at 1/4 probability and get 1 bit of evidence to lower this further then I think “ok, Ill update to 1/8”. If I use odds I start at 1:3, update to 1:6 and get the correct posterior of 1/7.

So essentially I’m constantly going back and forth - like you I find probabilities easier to picture but find odds easier for updates.

Comment by bucky on Laplace Approximation · 2019-08-21T10:51:28.626Z · score: 1 (1 votes) · LW · GW

For an introduction to MCMC aimed at a similar level target audience, I found this explanation helpful.

Comment by bucky on Why do humans not have built-in neural i/o channels? · 2019-08-11T20:19:02.571Z · score: 1 (1 votes) · LW · GW

Communication requires both input and output channels. All of the instances I can think of from the animal world involve a sense (hearing, sight, smell, touch) which has evolved with a different benefit. Then an output can evolve to take communicate using this sense as the input.

This seems orders of magnitude less complex than evolving input and output simultaneously which would be required for direct brain communication (a least I can't think of another option).

Even if it could potentially happen, before it did there would be many instances of indirect communication evolving. Take-off happening first in a species with indirect communication is a fairly inevitable consequence of the relative complexity of the evolutions required.

Comment by bucky on Why Subagents? · 2019-08-02T21:28:00.482Z · score: 3 (2 votes) · LW · GW

Imagine a second agent which has the same preferences but an anti-status-quo preference between mushroom and pepperoni.

This would be exploitable by a third agent who is able to compare mushroom and pepperoni but assigns equal utilities to both. However the original agent described in the OP would not be able to exploit agent 2 (if agent 1's status-quo bias is larger than agent 2's anti-status-quo bias), so agent 3 dominates agent 1 in terms of performance.

Over multiple dimensions agent 3 becomes much more complex than agent 1. Having a status quo bias makes sense as a way to avoid being exploited whilst also being less computationally expensive than tracking or calculating every preference ordering.

Assuming agent 2 is rare, the loss incurred by not being able to exploit others is small.

Comment by bucky on Drive-By Low-Effort Criticism · 2019-07-31T19:41:15.570Z · score: 7 (4 votes) · LW · GW
Start with lower-effort posts, to get a sense of how people react to the headline and thesis statement.

Shortform seems like a great way to do this.

Comment by bucky on From Laplace to BIC · 2019-07-24T22:24:41.147Z · score: 1 (1 votes) · LW · GW

In removing the terms I think we're removing all of the widths of the peak in the various dimensions. So in the case where the widths are radically different between the models this would mean that N would need to be even larger for BIC to be a useful approximation.

The widths issue might come up, for example, when an additional parameter is added which splits the data into 2 populations with drastically different population sizes - the small population is likely to have a wider peak.

Is that right?

Comment by bucky on Laplace Approximation · 2019-07-21T20:46:59.587Z · score: 1 (1 votes) · LW · GW

Thanks for this sequence, I've read each post 3 or 4 times to try to properly get it.

Am I right in thinking that in order to replace we not only require a uniform prior but also that span unit volume?

Comment by bucky on What do you think of cognitive types and MBTI? What type are you? What do you think is the percentage of the 16 different personality types on LessWrong? · 2019-07-19T09:24:12.838Z · score: 4 (3 votes) · LW · GW

The last one appears to be 2016 (this was a slightly wider survey which included other rationalist communities) which was before the lesswrong 2.0 relaunch. I haven't heard of any plans for surveys - maybe a mod can fill us in.

Slatestarcodex does an annual survey of its readers. Scott pre-registers some investigations and then reports on results. This year, for example, he got a negative result on "Math preference vs Corn eating style" and more interesting results in the ongoing birth-order investigation.

Comment by bucky on What do you think of cognitive types and MBTI? What type are you? What do you think is the percentage of the 16 different personality types on LessWrong? · 2019-07-18T22:51:55.733Z · score: 23 (6 votes) · LW · GW

My own feelings on MBTI are similar to this SSC post - it's unscientific but manages to kinda work as long as you don't expect too much of it. I wouldn't make any life decisions based on it!

For the third part of the question we don't have to guess - the 2012 lesswrong survey included an MBTI question. Of the people who answered, 65% were INTP or INTJ, compared to 5-9% of Americans according to the MBTI website.

Comment by bucky on Let's Read: Superhuman AI for multiplayer poker · 2019-07-14T21:36:58.005Z · score: 7 (5 votes) · LW · GW

Thanks for this.

Nitpick:

The description of a big blind:

Big blind: the minimal money/poker chips that every player must bet in order to play. For example, \$0.1 would be a reasonable amount in casual play.

sounds more like an ante than a big blind. This is important for understanding the discussion of limping in Ars Technica.

Comment by bucky on Book Review: The Secret Of Our Success · 2019-07-06T22:00:46.353Z · score: 1 (1 votes) · LW · GW

Yes, that’s definitely upward selection pressure but I think that’s more evidence for “ability to solve problems” being the cause of our intelligence rather than “ability to transmit culture”.

Most cultural processes could be transmitted by being shown what to do and punished if you do it wrong. Language makes it easier but isn’t necessarily required. Chimps have some fairly complex tool kits knowledge of which appear to be transmitted culturally.

Comment by bucky on Everybody Knows · 2019-07-05T05:28:44.180Z · score: 5 (5 votes) · LW · GW

A version of this that I hear fairly often is “it’s common sense that...”

It works in the same way in that it makes it socially costly to argue against but is more insidious than “everybody knows” (at least in my circles “it’s common sense” has more of a veneer of respectability).

Both also have their proper uses which I think makes the improper uses more difficult to counter.

Comment by bucky on What are principled ways for penalising complexity in practice? · 2019-06-30T21:44:37.387Z · score: 1 (1 votes) · LW · GW

Thanks for this. I’m trying to get an intuition on how this works.

My mental picture is to imagine the likelihood function with respect to theta of the more complex model. The simpler model is the equivalent of a square function with height of its likelihood and width 1.

The relative areas under the graphs reflect the likelihood of the models. So if picturing the relative maximum likelihoods and how sharp the peak is on the more complex model gives an impression of the Bayes factor.

Does that work? Or is there a better mental model?

Comment by bucky on What's up with self-esteem? · 2019-06-25T13:36:40.094Z · score: 44 (11 votes) · LW · GW

From the literature on self esteem

Previously, I thought that self-worth was like an estimate of how valuable you are to your peers

is sociometer theory and

Now I think there's an extra dimension which has to do with simpler dominance-hierarchy behavior.

is hierometer theory.

Hierometer theory is relatively new (2016) and could be though of as a subset of sociometer theory if sociometer theory is interpreted more broadly. Accordingly it has less research backing it up and that which is there is mostly by the original proponents of the theory.

This paper gives an introduction to both and a summary of evidence (I found this diagram a useful explanation of the difference). The paper suggests that both are true to some extent and complement each other.

I've included some quotes below.

## Sociometer theory

Sociometer theory starts from the premise that human beings have a fundamental need to belong (Baumeister and Leary, 1995). Satisfying this need is advantageous: group members, when cooperating, afford one another significant opportunities for mutual gain (von Mises, 1963; Nowak and Highfield, 2011; Wilson, 2012). Accordingly, if individuals are excluded from key social networks, their prospects for surviving and reproducing are impaired. It is therefore plausible to hypothesize that a dedicated psychological system evolved to encourage social acceptance (Leary et al., 1995).
...
The original version of sociometer theory (Leary and Downs, 1995; Leary et al., 1995) emphasizes how self-esteem tracks social acceptance, by which is implied some sort of community belongingness, or social inclusion.
...
In contrast, the revised version (Leary and Baumeister, 2000) emphasizes how self-esteem tracks relational value, defined as the degree to which other people regard their relationship with the individual as important or valuable overall, for whatever reason.

## Hierometer theory

Like sociometer theory, hierometer theory proposes that self-regard serves an evolutionary function. Unlike sociometer theory, it proposes that this function is to navigate status hierarchies. Specifically, hierometer theory proposes that self-regard operates both indicatively—by tracking levels of social status—and imperatively—by regulating levels of status pursuit (Figure 1).
...
Note here some key differences between hierometer theory and dominance theory (Barkow, 1975, 1980), another alternative to sociometer theory (e.g., Leary et al., 2001). Dominance theory, plausibly interpreted, states that self-esteem tracks, not levels of social acceptance or relational value, but instead levels of “dominance” or “prestige,” by which some social or psychological, rather than behavioral, construct is meant.
...
Accordingly, hierometer theory proposes that higher (lower) prior social status promotes a behavioral strategy of augmented (diminished) assertiveness, with self-regard acting as the intrapsychic bridge—in particular, tracking social status in the first instance and then regulating behavioral strategy in terms of it. Note that the overall dynamic involved is consolidatory rather than compensatory: higher rather than lower status is proposed to lead to increased assertiveness. In this regard, hierometer theory differs from dominance theory, which arguably implies that it is losses in social status that prompt attempts to regain it (Barkow, 1980).

## Findings

... our findings are arguably consistent with the revised version of sociometer theory, which is equivocal about the type of relational value that self-esteem tracks, and by extension, the type of social acceptance that goes hand in hand with it. Indeed, hierometer theory, and the original version of sociometer theory, might each be considered complementary subsets of the revised version of sociometer theory, if the latter is construed very broadly as a theory which states that types of social relations (status, inclusion), which constitute different types of relational value, regulate types of behavioral strategies (assertiveness, affiliativeness) via types of self-regard (self-esteem, narcissism). If so, then our confirmatory findings for hierometer theory, and mixed findings for the original version of sociometer theory, would still suggest that the revised version of sociometer theory holds truer for agentic variables than for communal ones.
Comment by bucky on No, it's not The Incentives—it's you · 2019-06-16T11:30:22.371Z · score: 1 (1 votes) · LW · GW

Take out the “10mph over” and I think this would be both fairer than the existing system and more effective.

(Maybe some modification to the calculation of the average to account for queues etc.)

Comment by bucky on No, it's not The Incentives—it's you · 2019-06-16T10:57:20.507Z · score: 1 (1 votes) · LW · GW

On reflection I’m not sure “above average” is a helpful frame.

I think it would be more helpful to say someone being “net negative” should be a valid target for criticism. Someone who is “net positive” but imperfect may sometimes still be a valid target depending on other considerations (such as moving an equilibrium).

Comment by bucky on No, it's not The Incentives—it's you · 2019-06-15T20:40:54.975Z · score: 6 (3 votes) · LW · GW

Trying to steelman the quoted section:

If one were to be above average but imperfect (e.g. not falsifying data or p-hacking but still publishing in paid access journals) then being called out for the imperfect bit could be bad. That person’s presence in the field is a net positive but if they don’t consider themselves able to afford the penalty of being perfect then they leave and the field suffers.

I’m not sure I endorse the specific example there but in a personal example:

My incentive at work is to spend more time on meeting my targets (vs other less measurable but important tasks) than is strictly beneficial for the company.

I do spend more time on these targets than would be optimal but I think I do this considerably less than is typical. I still overfocus on targets as I’ve been told in appraisals to do so.

If someone were to call me out on this I think I would be justified in feeling miffed, even if the person calling me out was acting better than me on this axis.

Comment by bucky on Book Review: The Secret Of Our Success · 2019-06-07T23:23:35.711Z · score: 6 (4 votes) · LW · GW

Heinrich counters with his own Cultural Intelligence Hypothesis – humans evolved big brains in order to be able to maintain things like Inuit seal hunting techniques.

I can’t really see how this would work.

Partly this is because maintaining techniques like this doesn’t seem difficult enough to justify just how intelligent humans are - on a scale of chimp to human it seems like it’s more on the chimp end. The fact that inventing the technique is impressive doesn’t imply that learning the technique is impressive.

But mainly I can’t see the selection pressure for increasing intelligence. Not being able to remember the hunting technique is obviously bad but where is the upwards selection pressure?

I definitely agree that Cultural Intelligence is important and is one of the ways humans have used their intelligence but I think the Machiavellian Intelligence Hypothesis is a stronger candidate for the root cause.

Comment by bucky on Steelmanning Divination · 2019-06-06T09:38:22.694Z · score: 21 (11 votes) · LW · GW

In an innovation workshop we were taught the following technique:

Make a list of 6 things your company is good at

Make a list of 6 applications of your product(s)

Make a list of 6 random words (Disney characters? City names?)

Roll 3 dice and select the corresponding words from the lists. Think about those 3 words and see what ideas you can come up with based on them.

Everyone I spoke to agreed that this was the best technique which we were taught. I knew constrained creativity was a thing but I think using this technique really drove the point home. I don't think this is quite the same thing as traditional divination (e.g. you can repeat this a few times and then choose your best idea) but I wonder if it is relying on similar principles.

Comment by bucky on FB/Discord Style Reacts · 2019-06-06T07:30:24.627Z · score: 2 (2 votes) · LW · GW

"I especially like/benefited from this bit:

Quote from post/comment"

Comment by bucky on How is Solomonoff induction calculated in practice? · 2019-06-05T21:13:15.838Z · score: 3 (2 votes) · LW · GW

Well that explains why I was struggling to find anything online!

Thanks for the link, I’ve been going through some of the techniques.

Using AIC the penalty for each additional parameter is a factor of e. For BIC the equivalent is so the more samples the more penalised a complex model is. For large n the models diverge - are there principled methods for choosing which regularisation to use?

Comment by bucky on How is Solomonoff induction calculated in practice? · 2019-06-05T19:27:55.049Z · score: 2 (2 votes) · LW · GW

Yes, this is helpful - I had thought of Solomonoff induction as only being calculating the prior but it’s helpful to understand the terminology properly.

Comment by bucky on Book review: The Sleepwalkers by Arthur Koestler · 2019-05-31T11:10:39.203Z · score: 3 (2 votes) · LW · GW

If the curves are constructed randomly and independently then in some cases a linear relationship would be implied by the central limit theorem.

Not sure if this is helpful or not - CLT assumptions may or may not be valid in the instances you're thinking of. I think my brain just went "Sum of many different variables leading to a surprising regular pattern? That reminds me of CLT".

Comment by bucky on Simple Rules of Law · 2019-05-20T08:03:53.554Z · score: 1 (1 votes) · LW · GW

For L], what would be the effect of scenario 1.5 - CEOs are fired if (but not only if) they are judged to be bad for the stock price?

There would be an option that if the CEO is fired for other reasons than the prediction market that the market doesn't pay out and all bets are refunded - not sure if this would help or hinder!

Note: There's an unfinished sentence in this section, end of 3rd to last paragraph

So I think that realistically
Comment by bucky on My poorly done attempt of formulating a game with incomplete information. · 2019-05-03T22:29:51.008Z · score: 2 (2 votes) · LW · GW

I wonder what would happen if one were to remove b and play the game iteratively. The game stops after 50 iterations or the first time S fails the test or defects.

b is then essentially replaced by S’s expected payoff over the remaining iterations if he remains loyal. However M would know this value so the game might need further modification.

Comment by bucky on My poorly done attempt of formulating a game with incomplete information. · 2019-05-02T10:24:29.693Z · score: 3 (2 votes) · LW · GW

Thanks for posting, I had fun trying to solve it and I think I learned a few things.

My solution is below (I think this is correct but I’m no expert) but I’ve hidden it in a spoiler in case you’re still wanting to figure it out yourself!

M has preference order of . He wants to set r such that if S has then S will pass the test and then remain loyal. If S has then M wants S to fail the test and therefore not get the chance to defect in round 2. It is common knowledge that this is what M wants.

Starting by making S’s Payoff for 2b less than that for 1 gives a formula for r:

for some small positive

With this value for r, S’s payoff matrix becomes:

1.

2a.

2b.

We can see that if then S’s best payoff is obtained by choosing 2a. Otherwise his best payoff is 1. This is exactly what M wants - he has changed S's payoffs to make S's preference order the same as his to the greatest extent possible.

Due to M's preference being common knowledge, S knows that M will choose this value of r and therefore knows what v is before he chooses whether to pass the test () and can choose between the three options simultaneously.

This is an interesting result as M's decision on r does not depend on the tax rate - he must always set an obedience test to be slightly more aversive than the entire value that is at stake. The tax rate only affects whether S will choose to pass the test.