Using smart thermometer data to estimate the number of coronavirus cases 2020-03-23T04:26:32.890Z
Case Studies Highlighting CFAR’s Impact on Existential Risk 2017-01-10T18:51:53.178Z
Results of a One-Year Longitudinal Study of CFAR Alumni 2015-12-12T04:39:46.399Z
The effect of effectiveness information on charitable giving 2014-04-15T16:43:24.702Z
Practical Benefits of Rationality (LW Census Results) 2014-01-31T17:24:38.810Z
Participation in the LW Community Associated with Less Bias 2012-12-09T12:15:42.385Z
[Link] Singularity Summit Talks 2012-10-28T04:28:54.157Z
Take Part in CFAR Rationality Surveys 2012-07-18T23:57:52.193Z
Meetup : Chicago games at Harold Washington Library (Sun 6/17) 2012-06-13T04:25:05.856Z
Meetup : Weekly Chicago Meetups Resume 5/26 2012-05-16T17:53:54.836Z
Meetup : Weekly Chicago Meetups 2012-04-12T06:14:54.526Z
[LINK] Being proven wrong is like winning the lottery 2011-10-29T22:40:12.609Z
Harry Potter and the Methods of Rationality discussion thread, part 8 2011-08-25T02:17:00.455Z
[SEQ RERUN] Failing to Learn from History 2011-08-09T04:42:37.325Z
[SEQ RERUN] The Modesty Argument 2011-04-23T22:48:04.458Z
[SEQ RERUN] The Martial Art of Rationality 2011-04-19T19:41:19.699Z
Introduction to the Sequence Reruns 2011-04-19T19:39:41.706Z
New Less Wrong Feature: Rerunning The Sequences 2011-04-11T17:01:59.047Z
Preschoolers learning to guess the teacher's password [link] 2011-03-18T04:13:23.945Z
Harry Potter and the Methods of Rationality discussion thread, part 7 2011-01-14T06:49:46.793Z
Harry Potter and the Methods of Rationality discussion thread, part 6 2010-11-27T08:25:52.446Z
Harry Potter and the Methods of Rationality discussion thread, part 3 2010-08-30T05:37:32.615Z
Harry Potter and the Methods of Rationality discussion thread 2010-05-27T00:10:57.279Z
Open Thread: April 2010, Part 2 2010-04-08T03:09:18.648Z
Open Thread: April 2010 2010-04-01T15:21:03.777Z


Comment by Unnamed on What was my mistake evaluating risk in this situation? · 2021-08-06T00:54:43.282Z · LW · GW
  1. It sounds like you were relying pretty heavily on the amount of alarm in the media as one of your main indicators of how much to worry (while using an interpretive filter). What you took from the swine flu example is that the media tends to be too alarmist, but you also/instead could've concluded that the media is not very good at risk assessment (and maybe isn't even trying that hard to do good risk assessments). The line of reasoning that this new virus is probably less dangerous than swine flu because it's less media hype depends on the assumption that the level of alarm in the media is strongly correlated with the level of danger (with a systematic bias towards exaggerated alarm); I think the correlation between media alarm and danger is not that strong in which case this argument doesn't go through. So, the media isn't that functional as an alarm, and you need some other approach for figuring out if there's a big problem. Really bad pandemics are possible, and the amount of alarm in the media isn't that strong an indicator of whether a new virus is likely to turn into a bad pandemic, so how could I tell if one is coming? You maybe could still use the media as an initial alert: the media is alarmed about this thing, and it is the sort of thing that has the potential to be really bad, so I'll take that as my cue to put effort into understanding what's going on (via some other approach that doesn't rely on the media). Or, you could try to be plugged into other information environments which are more reliable, such that you'd trust them more if they raised an alarm. I benefited from hearing things like this and this, and similar things by word-of-mouth & ephemeral Facebook posts.
  2. It helps to think in terms of probabilities & expected values. Scott wrote about this in some depth in A Failure, But Not of Prediction. For example: If swine flu turned out to be unimportant after the media hyped it up, that gives reason to think that the probability that the next media-hyped virus will be really bad is more like 25% than 75%. But it doesn't give much reason to think that the probability is more like 1% than 25% - not enough data for that. And if you see the probability that a novel coronavirus will turn into a really bad pandemic as being as high as 25%, then it's worth investigating & preparing for.
  3. It sounds like a big part of your lack of concern is that you thought the illness wasn't that serious, so that even if the virus became widespread it wouldn't affect you much. My memory is that this is different from the reasoning of most people who weren't very concerned, as it was more common to think that the virus wouldn't become widespread. So (a) this mismatch seems like a clue that something might be up, and worth looking into. And (b) I think there were reasons to think that the virus would be a big deal if it became widespread, e.g. the lives of people in Wuhan had changed in pretty drastic ways as a result of the virus.
Comment by Unnamed on The Point of Trade · 2021-06-24T02:21:07.271Z · LW · GW

A few things that seem relevant (5 things, but maybe not crystallized in the right way to be 5 separate answers to the OP's question):

Quantity: mismatch between natural quantity produced and quantity desired. Maybe I can plant an apple tree & pick the apples, but I don't want a whole treeful of apples. Maybe the most efficient way to make t-shirts is to build a big machine that makes a million shirts, and a process for making just 10 of them is wildly inefficient in terms of resources per shirt. (related keywords: economies of scale, capital investments)

Timing: maybe I want something now & it would take half an hour for me to make it (and what I'll want is unpredictable, I can't keep a giant inventory of everything I might want - though maybe the "all of us liked exactly the same objects exactly the same amount" supposition erases this issue). If I need to plant a tree and wait for it to grow apples that'll take years. Building the giant t-shirt machine might involve more than a single lifetime of person-hours of work.

Some things you can't do for yourself: if I Matrix-learn how to give good massages, that won't let me get a good massage - I need someone else to do that. That requires making some kind of deal with another person (maybe I give them a massage some other time?), which is at least kind of like trade. What counts as "trade"? Some simple cases seem not that trade-like, e.g. I play tennis with someone because you can't play a tennis match by yourself, although we could frame it as trade-like: I'm providing them with a tennis opponent and in exchange they're providing me with a tennis opponent. The massage exchange seems more trade-like (because asynchronous?), other cases where we aren't just exchanging the exact same service even more so.

Some production involves multiple people: Maybe it takes multiple people to carry a heavy object, or to pick apples (one in the tree & one on the ground with a basket?), or to operate a giant machine. Or people decide they'd rather do it together with other people (e.g. because the total quantity produced is more than any one person wants, or to finish the job more quickly). So there needs to be some kind of deal between the people on how to divide up what they produce. That also seems kind of like trade. Less so in some simple cases (a group of people dividing up the work equally and then dividing up the output equally), more so as it gets more complicated.

Complicated many-person coordination: Maybe a bunch of people work together to build, supply, and operate the giant t-shirt machine and divide the t-shirts between themselves. But the most efficient way to get the screws for the machine is with a giant machine that produces way more screws than the t-shirt machine needs, so most of the screws are used for other purposes. And the most efficient way to make the fertilizer for the cotton fields involves making way more fertilizer than is needed for the t-shirt cotton. Etc. So we have multiple giant teams, partially overlapping, e.g the screw-machine-makers have a small role in the t-shirt production & in the production of many other things, and so get a little share of each. With long supply chains this might look more like screw-machine-makers bartering screws for t-shirts rather than screw-machine-makers being part of the t-shirt team. And if you think about schemes for arranging all of this, those can start to look more like trade and an economy, e.g. all these people communicating & coordinating to figure out how much to make of each thing and where to send it and so on might want to use something that looks a lot like prices (related keywords: the knowledge problem).

Comment by Unnamed on 10 Deadly Viruses And Bacteria Created In Labs - 6 SARS 2.0 · 2021-06-20T06:16:19.190Z · LW · GW

Also in early 2019, Kelsey Piper's article Biologists are trying to make bird flu easier to spread. Can we not? was published at Vox (Future Perfect).

Comment by Unnamed on On the unpopularity of cryonics: life sucks, but at least then you die · 2021-05-21T04:20:30.703Z · LW · GW

That's a Nas sample. You might like Illmatic.

Comment by Unnamed on Deliberately Vague Language is Bullshit · 2021-05-14T22:41:57.641Z · LW · GW

A Paul Graham essay which is more directly related to this topic is How to Write Usefully:

Useful writing makes claims that are as strong as they can be made without becoming false.

For example, it's more useful to say that Pike's Peak is near the middle of Colorado than merely somewhere in Colorado. But if I say it's in the exact middle of Colorado, I've now gone too far, because it's a bit east of the middle.

Precision and correctness are like opposing forces. It's easy to satisfy one if you ignore the other. The converse of vaporous academic writing is the bold, but false, rhetoric of demagogues. Useful writing is bold, but true.

Comment by Unnamed on Monastery and Throne · 2021-04-09T07:50:35.588Z · LW · GW

The Nudgerism section seems to be mushing together various psychology-related things which don't have much to do with nudging.

Things like downplaying risks in order to prevent panic are at most very loosely related to nudging, and at least as ancient as the practice of placing objects at eye-level. Seems like an over-extension of focusing on "morale" and other Leaders of Men style attributes.

The main overlaps between the book Nudge and the awful The Cognitive Bias That Makes Us Panic About Coronavirus Bloomberg article are 1) they were both written by Cass Sunstein and 2) the one intervention that's explicitly recommended in the Bloomberg article is publicizing accurate information about coronavirus risk probabilities.

One of the main themes of the nudge movement is that human behavior is an empirical field that can be studied, and one of the main flaws of the thing being called "nudgerism" is making up ungrounded (and often inaccurate) stories about how people will behave (such as what things will induce a "false sense of security"). These stories often are made by people without relevant expertise who don't even seem to be trying very hard to make accurate predictions.

The British government has a Behavioural Insights Team which is colloquially known as the Nudge Unit; I'd guess that they didn't have much to do with the screwups that are being called "nudgerism."

Comment by Unnamed on Eli's shortform feed · 2021-03-30T08:28:10.645Z · LW · GW 

Comment by Unnamed on Improvement for pundit prediction comparisons · 2021-03-28T20:26:48.410Z · LW · GW

I expect it will be easier to get Metaculus users to make forecasts on pundits' questions than to get pundits to make forecasts on each other's questions.

Suggested variant (with dates for concreteness):

Dec 1: deadline for pundits to submit their questions
Dec 10: metaculus announces the final version of all the questions they're using, but does not open markets
Dec 20: deadline for pundits & anyone else to privately submit their forecasts (maybe hashed), and metaculus markets open
Dec 31: current metaculus consensus becomes the official metaculus forecast for the questions, and pundits (& anyone else) can publicize the forecasts that they made by Dec 20

Contestants (anyone who submitted forecasts by Dec 20) mainly get judged based on how they did relative to the Dec 31 metaculus forecast. I expect that they will mostly be pundits making forecasts on their own questions, plus forecasting aficionados.

(We want contestants & metaculus to make their forecasts simultaneously, with neither having access to the other's forecasts, which is tricky since metaculus is a public platform. That's why I have the separate deadlines on Dec 20 & Dec 31, with contestants' forecasts initially private - hopefully that's a short enough time period so that not much new information should arise, and long enough for people to have time to make forecasts.)

With only a small sample size of questions, it may be more meaningful to evaluate contestants based on how close they came to the official metaculus forecast rather than on how accurate they were (there's a bias-variance tradeoff). As a contestant does more questions (this year or over multiple years), the comparison with what actually happened becomes more meaningful.

Comment by Unnamed on Strong Evidence is Common · 2021-03-14T01:27:52.670Z · LW · GW

Maybe a nitpick, but the driver's license posterior of 95% seems too high. (Or at least the claim isn't stated precisely.) I'd have less than a 95% success rate at guessing the exact name string that appears on someone's driver's license. Maybe there's a middle name between the "Mark" and the "Xu", maybe the driver's license says "Marc" or "Marcus", etc.

I think you can get to 95% with a phone number or a wifi password or similar, so this is probably just a nitpick.

Comment by Unnamed on Where does the phrase "central example" come from? · 2021-03-12T06:39:40.058Z · LW · GW

Although maybe not that disproportionate - one recent post was throwing off the search results. Without it, rationalish subreddits still show up a few times on the first couple pages of search results, but not overwhelmingly.

Comment by Unnamed on Where does the phrase "central example" come from? · 2021-03-12T06:31:11.105Z · LW · GW

Searching for the phrase on Reddit does turn up a disproportionate number of hits from /r/slatestarcodex. So not LW-exclusive, but maybe unusually common around here. Possibly traceable to Weak Men Are Superweapons:

What is the problem with statements like this?

First, they are meant to re-center a category. Remember, people think in terms of categories with central and noncentral members – a sparrow is a central bird, an ostrich a noncentral one. But if you live on the Ostrich World, which is inhabited only by ostriches, emus, and cassowaries, then probably an ostrich seems like a pretty central example of ‘bird’ and the first sparrow you see will be fantastically strange.

Right now most people’s central examples of religion are probably things like your local neighborhood church. If you’re American, it’s probably a bland Protestant denomination like the Episcopalians or something.

The guy whose central examples of religion are Pope Francis and the Dalai Lama is probably going to have a different perception of religion than the guy whose central examples are Torquemada and Fred Phelps. If you convert someone from the first kind of person to the second kind of person, you’ve gone most of the way to making them an atheist.

Comment by Unnamed on Where does the phrase "central example" come from? · 2021-03-12T06:20:29.106Z · LW · GW

It's not a LW-distinctive phrase. Try searching Google News, for instance. It falls out of spatial models of concepts such as prototype theory, e.g. a robin is a central example of a bird while an ostrich is not.

Comment by Unnamed on Why Hasn't Effective Altruism Grown Since 2015? · 2021-03-10T06:05:40.779Z · LW · GW

The "all other money moved" bars on the first GiveWell graph (which I think represent donations from individual donors) do look a lot like exponential growth. Except 2015 was way above the trend line (and 2014 & 2016 a bit above too).

If you take the first and last data points (4.1 in 2011 & 83.3 in 2019), it's a 46% annual growth rate.

If you break it down into four two-year periods (which conveniently matches the various little sub-trends), it's:

2011-13: 46% annual growth (4.1 to 8.7)
2013-15: 123% annual growth (8.7 to 43.4)
2015-17: 3% annual growth (43.4 to 45.7)
2017-19: 35% annual growth (45.7 to 83.3)

2019 "all other money moved" is exactly where you'd project if you extrapolated the 2011-13 trend, although it does look like the trend has slowed a bit (even aside from the 2015 outlier) since 35% < 46%.

If GiveWell shares the "number of donors" count for each year that trend might be smoother (less influenced by a few very large donations), and more relevant for this question of how much EA has been growing.

Funding from Open Phil / Good Ventures looks more like a step function, with massive ramping up in 2013-16 and then a plateau (with year-to-year noise). Which is what you might expect from a big foundation - they can ramp up spending much faster than what you'd see with organic growth, but that doesn't represent a sustainable exponential trend (if Good Ventures had kept ramping up at the same rate then they would have run out of money by now).

The GWWC pledge data look like linear growth since 2014, rather than exponential growth or a plateau.

On the whole it looks like there has been growth over the past few years, though the growth rate is lower than it was in 2012-16 and the amount & shape of the growth differs between metrics.

Comment by Unnamed on Covid 3/4: Declare Victory and Leave Home · 2021-03-05T01:04:42.560Z · LW · GW

It appears Operation Warp Speed had to be funded by raiding other sources because Congress couldn’t be bothered to fund it. As MR points out, this is a scandal because it was necessary, rather than because it was done. It’s scary, because it implies that under a different administration Operation Warp Speed could easily have not happened at all.

There are gaps in the reporting on Operation Warp Speed funding, because apparently a bunch of the money that Congress did allocate for vaccines hasn't been spent yet. I don't understand why the White House spent other money but not that money.

Comment by Unnamed on "New EA cause area: voting"; or, "what's wrong with this calculation?" · 2021-02-27T03:31:04.145Z · LW · GW

Previous discussions:

Voting is like donating thousands of dollars to charity

If you care about social impact, why is voting important?

Comment by Unnamed on Avoid Contentious Terms · 2021-02-24T04:38:06.341Z · LW · GW

There are advantages to this style of writing even when the general term isn't contentious.

These kinds of concrete descriptions encourage readers to look at the world and see what's there, rather than engaging primarily with you and your concepts.

This can be good for people who know less about the topic, since looking at the world has fewer prerequisites. And it can be good for people who know more about the topic, since they can gain texture and depth by looking at new examples.

Though with non-contentious topics it's easier to add a general term at the end as a label to remember, or to tie the post into a larger conversation, without overshadowing the rest of the post.

Comment by Unnamed on The Prototypical Negotiation Game · 2021-02-21T22:23:31.215Z · LW · GW

Related: Insights from 'The Strategy of Conflict'

Comment by Unnamed on Incentive Problems With Current Forecasting Competitions. · 2021-02-15T10:40:59.802Z · LW · GW

The full-blown process of in-depth contract negotiations, etc., is presumably beyond the scope of the current competitive forecasting arena. 

One of the main things that I get out of the sports comparison is that it points to a different way of using (and thinking of) metrics. The obvious default, with forecasting, is to think of metrics as possible scoring rules, where the person with the highest score wins the prize (or appears first on the leaderboard). In that case, it's very important to pick a good metric, which provides good incentives. An alternative is to treat human judgment as primary, whether that means a committee using its judgment to pick which forecasters win prizes, or forecasters voting on an all-star team, or an employer trying to decide who to hire to do some forecasting for them, or just who has street cred in the forecasting community. And metrics are a way to try to help those people be more informed about forecasters' abilities & performance, so that they'll make better judgment. In that case, the standards for what is a good metric to include are very different. (There's also a third use case for metrics, where the forecaster uses metrics about their own performance to try to get better at forecasting.)

Sports also provide an example of what this looks like in action, what sorts of stats exist, how they're presented, who came up with them, what sort of work went into creating them, how they evaluate different stats and decide which ones to emphasize, etc.  And it seems plausible that similar work could be done with forecasting, since much of that work was done by sports fans who are nerds rather than by the teams; forecasting has fewer fans but a higher nerd density. I did some brainstorming in another comment on some potential forecasting stats which draws a lot of inspiration from that; not sure how much of it is retreading familiar ground.

Comment by Unnamed on Incentive Problems With Current Forecasting Competitions. · 2021-02-15T09:45:19.972Z · LW · GW

Here' s a brainstorm of some possible forecasting metrics which might go in those tables (probably I'm reinventing some wheels here; I know more about existing metrics for sports than for forecasting):

  • Leading Indicator: get credit for making predictions if the consensus then moves in the same direction over the next hours / days / n predictions (alternate version: only if that movement winds up being towards the true outcome)
  • Points Relative to Your Expectation: each forecast has an expected score according to that forecast (e.g., if the consensus is 60% and you say 80%, you think there's a 0.8 chance you'll gain points for doing better than the consensus and a 0.2 chance you'll lose points for doing worse than consensus). Report expected score alongside actual score, or report the ratio actual/expected. If that ratio is > 1, that means you've been underconfident or (more likely) lucky. Also, expected score is similar to "total number of forecasts", weighted by boldness of forecasts. You could also have a column for the consensus expected score (in the example: your expected score if there was only a 0.6 chance you'd gain points and a 0.4 chance you'd lose points).
  • Marginal Contribution to Collective Forecast: have some way of calculating the overall collective forecast on each question (which could be just a simple average, or could involve fancier stuff to try to make it more accurate including putting more weight on some people's forecasts than others). Also calculate what the overall collective forecast would have been if you'd been absent from that question. You get credit for the size of the difference between those two numbers. (Alternative versions: you only get credit if you moved the collective forecast in the right direction, or you get negative credit if you moved it in the wrong direction.)
  • Trailblazer Score: use whichever forecasting accuracy metric (e.g. brier score relative to consensus) while only including cases where a person's forecast differed from the consensus at the time by at least X amount. Relevant in part because there might be different skillsets to noticing that the consensus seems off and adjusting a bit in the right direction vs. coming up with your own forecast and trusting it even if it's not close to consensus. (And the latter skillset might be relevant if you're making forecasts on your own without the benefit of having a platform consensus to start from.)
  • Market Mover: find some way to track which comments lead to people changing their forecasts. Credit those commenters based on how much they moved the market. (alternative version: only if it moved towards truth)
  • Pseudoprofit: find some way to transform people's predictions into hypothetical bets against each other (or against the house), track each person's total profit & total amount "bet". (I'm not sure if this to different calculations or if it's just a different gloss on the same calculations.)
  • Splits: tag each question, and each forecast, with various features. Tags by topic (coronavirus, elections, technology, etc.), by what sort of event it's about (e.g. will people accomplish a thing they're trying to do), by amount of activity on the question, by time till event (short term vs. medium term vs. long term markets), by whether the question is binary or continuous, by whether the forecast was placed early vs. middle vs. late in the duration of the question, etc. Be able to show each scoring table only for the subset of forecasts that fit a particular tag. 
  • Predicted Future Rating: On any metric, you can set up formulas to predict what people will score on that metric over the next (period of time / set of markets). A simple way to do that is to just predict future scores on that metric based on past scores on the same metric, with some regression towards the mean, using historical data to estimate the relationship. But there are also more complicated things using past performance on some metrics (especially less noisy ones) to help predict future performance on other metrics. And also analyses to check whether patterns in past data are mostly signal or noise (e.g. if a person appears to have improved over time, or if they have interesting splits). (Finding a way to predict future scores is a good way to come up with a comprehensive metric, since it involves finding an underlying skill from among the noise. And the analyses can also provide information about how important different metrics are, which ones to include in the big table, which ones to make more prominent.)
Comment by Unnamed on Covid 2/11: As Expected · 2021-02-15T08:25:30.904Z · LW · GW

The thing that I was more surprised by, looking at the scoring system, is that Metaculus is set up as a platform for maintaining a forecast rather than as a place where you make a forecast at a particular time. (If I'm understanding the scoring correctly.) 

Metaculus scores your current forecast at each moment, from the moment you first enter a forecast on the question until the moment the question closes. Where "your current forecast" at each moment is the most recent number that you entered, and the only thing that happens when you enter an updated prediction is that for the rest of the moments (until you update it again) "your current forecast" will be a different number. Every moment gets equal weight regardless of whether you last entered a number just now or three weeks ago (except that the very last moment when the question closes gets extra weight).

So it's not like a literal betting market where you're buying at the current market price at the moment that you make your forecast. If you don't keep updating your forecast, then you-at-that-moment is going up against the future consensus forecast.

So the scoring system rewards the activity of entering more questions, and also the activity of updating your forecasts on each of those questions again and again to keep them up-to-date.

Comment by Unnamed on Covid 2/11: As Expected · 2021-02-13T09:43:45.248Z · LW · GW

There was a lesswrong post about this a while back ?

Comment by Unnamed on Why I Am Not in Charge · 2021-02-09T19:59:22.211Z · LW · GW

I was also imagining the distinctions of

adaptation-executers vs. fitness-maximizers


selection + unconscious reinforcement vs. conscious strategizing

which are similar.

Comment by Unnamed on 2019 Review: Voting Results! · 2021-02-06T03:28:13.787Z · LW · GW

And neither of you voted for it!

Comment by Unnamed on 2019 Review: Voting Results! · 2021-02-03T05:02:35.959Z · LW · GW

Seems like a good thing to check in principle, but my guess is it won't make much difference for this or other posts. AI posts got about as many nonzero votes as other posts, and the ranking of posts by avg vote is almost the same as the official ranking by total votes.

Comment by Unnamed on 2019 Review: Voting Results! · 2021-02-03T04:58:46.800Z · LW · GW

For the 2019 Review, I think it would've helped if you/Rob/others had posted something like this as reviews of the post. Then voters would at least see that you had this take, and maybe people who disagree would've replied there which could've led to some of this getting hashed out in the comments.

Comment by Unnamed on [deleted post] 2021-02-01T05:15:13.367Z

Crazy story: one time I went to and generated a 20 digit string and it was 89921983981118509034.

Comment by Unnamed on What is up with spirituality? · 2021-01-27T07:28:50.068Z · LW · GW

One of the standard stories is that it's about social cohesion. Especially with rituals done as a group, and other features like visibly taking on costly restrictions in a way that demonstrates buy-in.

Sosis & Alcorta (2003).  Signaling, solidarity, and the sacred: The evolution of religious behavior

Comment by Unnamed on Leaky Delegation: You are not a Commodity · 2021-01-27T01:37:56.236Z · LW · GW

The parts of this that are about factors that raise the financial cost of delegation seem less relevant than the parts about quality, personalization, learning, etc.

I'd break down the decision of whether to delegate into three levels of model simplicity:

1. Abstraction. A simple back-of-the-envelope calculation on the biggest costs & benefits. Usually just money vs. time, leading to an estimate that delegating will cost $X for each hour freed up.

2. Concreteness. Look at the world, and at what will be different if you delegate or not. This can include more accurate estimates of time & money (maybe actually doing the delegation uses a bunch of time), or maybe the particular minutes you're freeing up are more or less valuable than your typical neutral minute. And a lot of it is looking at other costs or benefits of delegation that didn't make it into your simple model but are obvious once you look at them, involving things like quality, delay, distraction, reliability, and so on.

3. Subtlety. These are things that aren't necessarily obvious when you look at them, like the value of learning, or the sense of self-sufficiency that comes from being able to do things yourself, or the sense of capability & possibility that comes from being able to find ways to get other people to do things that you want. Or maybe it turns out that you're in a better mood & more engaged with the world on evenings when you've cooked your own dinner, or you're in a worse mood & more withdrawn on days when you had to do a bunch of vacuuming. 

A nice thing about this breakdown is that you don't have to figure out a thing at a higher level if it has already been accounted for at an earlier level. So the specifics of what goes into the price of laundry delivery don't really matter once you know the price - regardless of whether the price is influenced by overqualified workers, perks you don't care about, or whatever, the deal is that you either do your own laundry or you pay $Y for these guys to do it. (Unless you think you can find another laundry service that's better or cheaper.)

This simplifies things. It reduces the number of questions you need to ask yourself, so you can focus on the harder to track things.

Comment by Unnamed on The Real Rules Have No Exceptions · 2021-01-25T10:06:19.306Z · LW · GW

It seems like the core thing that this post is doing is treating the concept of "rule" as fundamental. 

If you have a general rule plus some exceptions, then obviously that "general rule" isn't the real process that is determining the results. And noticing that (obvious once you look at it) fact can be a useful insight/reframing.

The core claim that this post is putting forward, IMO, is that you should think of that "real process" as being a rule, and aim to give it the virtues of good rules such as being simple, explicit, stable, and legitimate (having legible justifications).

An alternative approach is to step outside of the "rules" framework and get in touch with what the rule is for - what preferences/values/strategy/patterns/structures/relationships/etc. it serves. Once you're in touch with that purpose, then you can think about both the current case, and what will become of the "general rule", in that light. This could end up with an explicitly reformulated rule, or not.

It seems like treating the "real process" as a rule is more fitting in some cases than others, a better fit for some people's style of thinking than for other people's, and also something that a person could choose to aim for more or less.

I think I'd find it easier to think through this topic if there was a long, diverse list of brief examples.

Comment by Unnamed on Building up to an Internal Family Systems model · 2021-01-25T08:25:36.284Z · LW · GW

The back-and-forth (here and elsewhere) between Kaj & pjeby was an unusually good, rich, productive discussion, and it would be cool if the book could capture some of that. Not sure how feasible that is, given the sprawling nature of the discussion.

Comment by Unnamed on Dishonest Update Reporting · 2021-01-24T06:18:59.997Z · LW · GW

This post seems to me to be misunderstanding a major piece of Paul's "sluggish updating" post, and clashing with Paul's post in ways that aren't explicit.

The core of Paul's post, as I understood it, is that incentive landscapes often reward people for changing their stated views too gradually in response to new arguments/evidence, and Paul thinks he has often observed this behavioral pattern which he called "sluggish updating." Paul illustrated this incentive landscape through a story involving Alice and Bob, where Bob is thinking through his optimal strategy, since that's a convenient way to describe incentive landscapes. But that kind of intentional strategic thinking isn't how the incentives typically manifest themselves in behavior, in Paul's view (e.g., "I expect this to result in unconscious bias rather than conscious misrepresentation. I suspect this incentive significantly distorts the beliefs of many reasonable people on important questions"). This post by Zvi misunderstands this as Paul describing the processes that go on inside the heads of actual Bobs. This loses track of the important distinction (which is the subject of multiple other LW Review nominees) between the rewards that shape an agent's behavior and the agent's intentions. It also sweeps much of the disagreement between Paul & Zvi's posts under the rug.

A few related ways the views in the two posts clash:

This post by Zvi focuses on dishonesty, while Paul suggests that unconsciously distorted beliefs are the typical case. This could be because Zvi disagrees with Paul and thinks that dishonesty is the typical case. Or it could be that Zvi is using the word "dishonest" broadly - he mostly agrees with Paul about what happens in people's heads, but applies the "dishonesty" frame in places where Paul wouldn't. Or maybe Zvi is just choosing to focus on the dishonest subset of cases. Or some combination of these.

Zvi focuses on cases where Bob is going to the extreme in following these incentives, optimizing heavily for it and propagating it into his thinking. "This is a world where all one cares about is how one is evaluated, and lying and deceiving others is free as long as you’re not caught." "Bob’s optimal strategy is full anti-epistemology." Paul seems especially interested in cases where pretty reasonable people (with some pretty good features in their epistemics, motivations, and incentives) still sometimes succumb to these incentives for sluggishness. Again, it's unclear how much of this is due to Zvi & Paul having different beliefs about the typical case and how much is about choosing to focus on different subsets of cases (or which cases to treats as central for model-building).

Paul's post is written from a perspective of 'Good epistemics don't happen by default', where thinking well as an individual involves noticing places where your mental processes haven't been aimed towards accurate beliefs and trying to do better, and social epistemics are an extension of that at the group level. Zvi's post is written from a perspective of 'catching cheaters', where good social epistemics is about noticing ways that people are something-like-lying to you, and trying to stop that from happening.

Zvi treats Bob as an adversary. Paul treats him as a potential ally (or as a state that you or I or anyone could find oneself in), and mentions "gaining awareness" of the sluggishness as one way for an individual to counter it.

Related to all of this, the terminology clashes (as I mentioned in a comment). I'd like to say a simple sentence like "Paul sees [?sluggishness?] as mainly due to [?unconscious processes?], Zvi as mainly due to [?dishonest update reporting?]" but I'm not sure what terms go in the blanks.

The "fire Bob" recommendation depends a lot on how you're looking at the problem space / which part of the problem space you're looking at. If it's just a recommendation for a narrow set of cases then I think it wouldn't apply to most of the cases that Paul was talking about in his "Observations in the wild", but if it's meant to apply more widely then that could get messy in ways that interact with the clashes I've described.

The other proposed solutions seem less central to these two posts, and to the clash between Paul & Zvi's perspectives.

I think there is something interesting in the contrast between Paul & Zvi's perspectives, but this post didn't work as a way to shine light on that contrast. It focuses on a different part of the problem space, while bringing in bits from Paul's post in ways that make it seem like it's engaging with Paul's perspective more than it actually does and make it confusing to look at both perspectives side by side.

Comment by Unnamed on Coherent decisions imply consistent utilities · 2021-01-15T00:22:56.762Z · LW · GW

Sounds like the thing that is typically called "regret aversion".

Comment by Unnamed on Covid 1/14: To Launch a Thousand Shipments · 2021-01-14T22:53:53.551Z · LW · GW

Crunching some numbers in a copy of the spreadsheet... Zvi's predictions are better than the naive model of assuming next week's numbers will be the same as this week's numbers.

Biggest improvement over the null model for predicting deaths (mean squared error is 47% as big), smallest improvement for positive test % (MSE 80% as big), in between for number of tests (MSE 67% as big).

Although if I instead look at the predicted weekly change and compare it to the actual change that week, all three sets of predictions are roughly equally accurate with correlations (predicted change vs. actual change) between .52 and .58.

Comment by Unnamed on Covid 1/14: To Launch a Thousand Shipments · 2021-01-14T20:10:59.682Z · LW · GW

When I read this bit:

Only 37% of all distributed doses have been given

I wondered how that would look translated into a delay. The number of doses given through January 13 equals the number of doses that had been distributed __ days earlier.

I see from the graph titled "The US COVID-19 Vaccine Shortfall" (and introduced with "We can start with how it’s gone so far:") that the answer is about 17 days.

This seems like a more natural framing - it matches the process of why many doses haven't been given yet, and it seems likely to be more stable as we project the curves forward over many weeks (and less dependent on the shape of the 'doses distributed' curve).

So now I'm wondering if the delay (now 17 days) is likely to get smaller over time, or larger, or stay about the same.

Comment by Unnamed on Dishonest Update Reporting · 2021-01-14T02:58:43.611Z · LW · GW

Seems like the terminology is still not settled well. 

There's a general thing which can be divided into two more specific things.

General Thing: The information points to 50%, the incentive landscape points to 70%, Bob says "70%".

Specific Thing 1: The information points to 50%, the incentive landscape points to 70%, Bob believes 50% and says "70%".

Specific Thing 2: The information points to 50%, the incentive landscape points to 70%, Bob believes and says "70%".

There are three Things and just two names, so the terminology is at least incomplete.

"Dishonest update reporting" sounds like the name of Specific Thing 1.

In Paul's post "sluggish updating" referred to the General Thing, but Dagon's argument here is that "sluggish updating" should only refer to Specific Thing 2. So there's ambiguity.

It seems most important to have a good name for the General Thing. And that's maybe the one that's nameless? Perhaps "sluggish update reporting", which can happen either because the updating is sluggish or because the reporting is sluggish/dishonest. Or "sluggish social updating"? Or something related to lightness? Or maybe "sluggish updating" is ok despite Dagon's concerns (e.g. a meteorologist updating their forecast could refer to changes that they make to the forecast that they present to the world).

Comment by Unnamed on Any examples of people analyzing/critiquing scientific studies or papers? · 2021-01-13T23:46:25.822Z · LW · GW

A couple things that are maybe not exactly what you're looking for but are nearby and probably somewhat useful:

The blog Data Colada (example, example2)

Elizabeth's "epistemic spot check" series (example)

Comment by Unnamed on Any examples of people analyzing/critiquing scientific studies or papers? · 2021-01-13T23:44:28.501Z · LW · GW

Here is a thing I wrote 10 years ago assessing an N-back study. That's a easy-for-me-to-remember example, where I also remember that the writeup comes pretty close to reflecting how I was thinking through things as I was looking at the paper.

Comment by Unnamed on Johannes Kepler, Sun Worshipper · 2021-01-11T20:56:35.948Z · LW · GW

Well, the sun being the only object in our solar system that emits light is evidence for it being at the center.

It seems likely that there's something special about whichever body is in the center of the solar system. A lot of astronomers thought the Earth was special for being made of rock & water, and that this was related to the Earth being at the center, but they just conjectured that Mars & Venus & the other planets were made of something else. Whereas Kepler had much more direct observations about the sun's unique luminosity.

Aristarchus had a heliocentric model of the solar system in Ancient Greece, apparently motivated in large part by the fact that the sun was the largest object in the solar system.

In hindsight, we know that both luminosity and size relative to neighbors are both highly correlated with being at the center of a solar system, with Aristarchus's size thing having a tighter causal relationship with centrality.

Comment by Unnamed on 100 Tips for a Better Life · 2021-01-05T10:45:00.493Z · LW · GW

Those public health official examples seem unrelated to tip #59 ("Those who generate anxiety in you and promise that they have the solution are grifters.").

I took hermanc1 to be pointing to how, in Feb-Mar 2020, the people who were saying scary sounding stuff (like using the word "pandemic") and proposing things to do about it were the ones who had insights and were telling it straight. Meanwhile many other people were calling those people out for "fearmongering" or spinning things to downplay the risk in order to prevent panic.

There are grifters who try to generate anxiety so they can sell you something. And also the world contains problems, and noticing problems can induce anxiety, and searching for & sharing (partial) solutions to problems is good. Maybe a sophisticated way of following tip #59 can distinguish between those, but the naive way of doing it can run into trouble and fail to see the smoke.

Comment by Unnamed on Covid 12/24: We’re F***ed, It’s Over · 2020-12-30T03:35:18.802Z · LW · GW

Back in March, there was a lot of concern that uncontrolled spread would overwhelm the medical system and some hope that delay would improve the standard of care. Do we have good estimates now of those two effects? They could influence IFR estimates by a fair amount.

Also, my understanding is that the number of infections could've shot way past herd immunity levels. Herd immunity is just the point at which the number of active infections starts declining rather than increasing, and if there are lots of active infections at that time then they can spread to much of the remaining people before dwindling.

Comment by Unnamed on Morality as "Coordination", vs "Do-Gooding" · 2020-12-29T06:47:24.113Z · LW · GW

I've had similar thoughts; the working title that I jotted down at some point is "Two Aspects of Morality: Do-Gooding and Coordination." A quick summary of those thoughts:

Do-gooding is about seeing some worlds as better than others, and steering towards the better ones. Consequentialism, basically. A widely held view is that what makes some worlds better than others is how good they are for the beings in those worlds, and so people often contrast do-gooding with selfishness because do-gooding requires recognizing that the world is full of moral patients.

Coordination is about recognizing that the world is full of other agents, who are trying to steer towards (at least somewhat) different worlds. It's about finding ways to arrange the efforts of many agents so that they add up to more than the sum of their parts, rather than less. In other words, try for: many agents combine their efforts to get to worlds that are better (according to each agent) than the world that that agent would have reached without working together. And try to avoid: agents stepping on each other's toes, devoting lots of their efforts to undoing what other agents have done, or otherwise undermining each other's efforts. Related: game theory, Moloch, decision theory, contractualism.

These both seem like aspects of morality because:

  • "moral emotions", "moral intuitions", and other places where people use words like "moral" arise from both sorts of situations
  • both aspects involve some deep structure related to being an agent in the world, neither seems like just messy implementation details for the other
  • a person who is trying to cultivate virtues or become a more effective agent will work on both
Comment by Unnamed on Great minds might not think alike · 2020-12-26T21:12:17.469Z · LW · GW

Related to section I: Dunning, Meyerowitz,& Holzberg (1989) Ambiguity and self-evaluation: The role of idiosyncratic trait definitions in self-serving assessments of ability. From the abstract:

When people are asked to compare their abilities to those of their peers, they predominantly provide self-serving assessments that appear objectively indefensible. This article proposes that such assessments occur because the meaning of most characteristics is ambiguous, which allows people to use self-serving trait definitions when providing self-evaluations. Studies 1 and 2 revealed that people provide self-serving assessments to the extent that the trait is ambiguous, that is, to the extent that it can describe a wide variety of behaviors.

Comment by Unnamed on DanielFilan's Shortform Feed · 2020-12-23T04:27:57.495Z · LW · GW

It seems clear that we want politicians to honestly talk about what they're intending to do with the policies that they're actively trying to change (especially if they have a reasonable chance of enacting new policies before the next election). That's how voters can know what they're getting.

It's less obvious how this should apply to their views on things which aren't going to be enacted into policy. Three lines of thinking that point in the direction of maybe it's good for politicians to keep quiet about (many of) their unpopular views:

It can be hard for listeners to tell how likely the policy is to be enacted, or how actively the politician will try to make it happen. I guess it's hard to fit into 5 words? e.g. I saw a list of politicians' "broken promises" on one of the fact checking sites, which was full of examples where the politician said they were in favor of something and then it didn't get enacted, and the fact checkers deemed that sufficient to count it as a broken promise. This can lead to voters putting too little weight on the things that they're actually electing the politician to do, e.g. local politics seems less functional if local politicians focus on talking about their views on national issues that they have no control over.

Another issue is that it's cheap talk. The incentive structure / feedback loops seem terrible for politicians talking about things unrelated to the policies they're enacting or blocking. Might be more functional to have a political system where politicians mostly talk about things that are more closely related to their actions, so that their words have meaning that voters can see.

Also, you can think of politicians' speech as attempted persuasion. You could think of voters as picking a person to go around advocating for the voters' hard-to-enact views (as well as to implement policies for the voters' feasible-to-enact views). So it seems like it could be reasonable for voters to say "I think X is bad, so I'm not going to vote for you if you go around advocating for X", and for a politician who personally favors X but doesn't talk about it to be successfully representing those voters.

Comment by Unnamed on Fusion and Equivocation in Korzybski's General Semantics · 2020-12-21T08:58:38.890Z · LW · GW

You can think of growth mindset as a deidentification, basically identical to that example of Anna the student except done by Anna about herself rather than by her teacher. "Yet" is a wedge that gets you to separate your concept of you-so-far from your concept of future you. "I'm bad at X" sneaks in an equivocation to imply "now and always."

Comment by Unnamed on Motive Ambiguity · 2020-12-20T04:50:48.591Z · LW · GW

I notice that many of these examples involve something like vice signalling - the person is destroying value in order to demonstrate that they have a quality which I (and most LWers) consider to be undesirable. It seems bad for the middle manager, politician, and start-up founder to aim for the shallow thing that they're prioritizing. And then they take the extra step of destroying something I do value in order to accentuate that. It's a combination that feels real icky.

The romantic dinner and the handmade gift examples don't have that feature. And those two cases feel more ambiguous - I can imagine versions of these where it seems good that the person is doing these things, and versions where it seems bad. I can picture a friend telling me "I took my partner out for their birthday to a restaurant that I don't really care for, but they just adore" and it being a heartwarming story, where it seems like something good is happening for my friend and their relationship.

Katja's recent post on Opposite Attractions points to one thing that seems good about taking your spouse to a restaurant that only they love - your spouse's life is full of things that you both like, and perhaps starved of certain styles of things that they like and you don't, and they could be getting something out of drawing from that latter category even if there's some sense in which they don't like it any more than a thing in the "youboth like it" category. And there's something good about them getting some of those things within the relationship, of having the ground that the relationship covers not be limited to the intersection of "the things you like" and "the things your spouse likes" - your relationship mostly takes advantage of that part of the territory but sometimes it's good to explore other parts of it together. And I could imagine you bringing an attitude to the meal where you're tuned in to your spouse's experience, trying to take pleasure in how much they enjoy the meal, rather than being focused on your own food. And (this is the part where paying a cost to resolve motive ambiguity comes in directly) going to a restaurant that they love and you don't like seems like it can help set the context for this kind of thing - putting the information in common knowledge between you two that this is a special occasion, and what sort of special occasion it's trying to be. It seems harder to hit some of these notes in a context where both people love the food.

(There are also versions the one-sided romantic dinner which seem worse, and good relationships where this version doesn't fit or isn't necessary.)

Comment by Unnamed on Writing tools for tabooing? · 2020-12-13T20:07:52.176Z · LW · GW

Can you tell your spellchecker that they're not words?

Comment by Unnamed on Number-guessing protocol? · 2020-12-07T21:33:51.157Z · LW · GW

If you're just doing this occasionally without recordkeeping, then it seems convenient to have the game result in "winners" rather than a more fine-grained score. But it could be fine to sometimes have multiple winners, or zero winners. Here's a simple protocol that does that:

The person who asks the question also defines what counts as "winning". e.g. "What's the value of such-and-such? Can anybody get it within 10%?" Then everyone guesses simultaneously, and all the people whose guesses are within 10% of the true value are "winners".

("Simultaneous" guessing can mean that first everyone comes up with their guess in their head, and then they take turns saying them out loud while on the honor system to not change their guess.)

Slightly more complicated, the asker could propose 2 standards of winning. "When did X happen? Grand prize if you guess the exact year, honorable mention if you get it within 5 years." Then if anyone guesses the exact year they're the big winner(s) and the people who get it within 5 years get the lesser glow of "honorable mention". And if no one guesses the exact year then the people who get it within 5 years feel more like winners.

If you continue farther in this direction you could get to one of Ericf's proposals. I think my version has lower barriers to entry, while Ericf's version could work better among people who use it regularly.

Comment by Unnamed on Real-Life Examples of Prediction Systems Interfering with the Real World (Predict-O-Matic Problems) · 2020-12-04T00:44:17.925Z · LW · GW

Note that Trump got around 63M votes in 2016, and around 71M in 2020, whereas Democrats got 66M and 75M respectively.

The 2020 results are 81M-74M with some votes still left to count. 75M-71M might have been the margin a few weeks ago when there were still a bunch more not-yet-counted votes.

Comment by Unnamed on Covid 12/3: Land of Confusion · 2020-12-03T22:46:11.751Z · LW · GW

Two minor corrections on the Denver Broncos section:

For whatever reason, Denver was told this weekend that the show had to go on, despite all four of its quarterbacks being ruled out due to contact tracing from their primary quarterback. No masks had been worn. 

If you think that represents gross incompetence and they should have held their backup backup backup quarterback in reserve like a designated survivor if they had no fifth option, you’d be right, but they did not think about that at the time.

The quarterback who got covid was their 3rd stringer (I think); he definitely wasn't their primary quarterback. Lock is their starter, Driskel is the one who got sick, and I believe their depth chart went Lock-Rypien-Driskel-Bortles. This is a downside of having 4 quarterbacks rather than 2 or 3 - more vectors into the quarterback room.

Also, the Broncos presumably did at least come across the idea of having a designated survivor reserve QB. They just decided not to do it. Some NFL teams have been keeping one of their quarterbacks apart - there have been news articles all year about the Buffalo Bills using Jake Fromm as their quarantined quarterback - and that news must've reached Denver.

Comment by Unnamed on Maybe Lying Can't Exist?! · 2020-11-15T02:23:07.803Z · LW · GW

Here's a toy example which should make it clearer that the probability assigned to the true state is not the only relevant update.

Let's say that a seeker is searching for something, and doesn't know whether it is in the north, east, south, or west. If the object is in the north, then it is best for the seeker to go towards it (north), worst for the seeker to go directly away from it (south), and intermediate for them to go perpendicular to it (east or west). The seeker meets a witness who knows where the thing is. The majority (2/3) of witnesses want to help the seeker find it and the rest (1/3) want to hinder the seeker's search. And they have common knowledge of all of this.

In this case, the witness can essentially just direct the seeker's search - if the witness says "it's north" then the seeker goes north, since 2/3 of witnesses are honest. So if it's north and the witness wants to hinder the seeker, they can just say "it's south". This seems clearly deceptive - it's hindering the seeker's search as much as possible by messing up their beliefs. But pointing them south does actually lead to a right-direction update on the true state of affairs, with p(north) increasing from 1/4 (the base rate) to 1/3 (the proportion of witnesses who aim to hinder). It's still a successful deception because it increases p(south) from 1/4 to 2/3, and that dominates the seeker's choice.