Posts

A Cruciverbalist’s Introduction to Bayesian reasoning 2021-04-04T08:50:07.729Z
Systematizing Epistemics: Principles for Resolving Forecasts 2021-03-29T20:46:06.923Z
Resolutions to the Challenge of Resolving Forecasts 2021-03-11T19:08:16.290Z
The Upper Limit of Value 2021-01-27T14:13:09.510Z
Multitudinous outside views 2020-08-18T06:21:47.566Z
Update more slowly! 2020-07-13T07:10:50.164Z
A Personal (Interim) COVID-19 Postmortem 2020-06-25T18:10:40.885Z
Market-shaping approaches to accelerate COVID-19 response: a role for option-based guarantees? 2020-04-27T22:43:26.034Z
Potential High-Leverage and Inexpensive Mitigations (which are still feasible) for Pandemics 2020-03-09T06:59:19.610Z
Ineffective Response to COVID-19 and Risk Compensation 2020-03-08T09:21:55.888Z
Link: Does the following seem like a reasonable brief summary of the key disagreements regarding AI risk? 2019-12-26T20:14:52.509Z
Updating a Complex Mental Model - An Applied Election Odds Example 2019-11-28T09:29:56.753Z
Theater Tickets, Sleeping Pills, and the Idiosyncrasies of Delegated Risk Management 2019-10-30T10:33:16.240Z
Divergence on Evidence Due to Differing Priors - A Political Case Study 2019-09-16T11:01:11.341Z
Hackable Rewards as a Safety Valve? 2019-09-10T10:33:40.238Z
What Programming Language Characteristics Would Allow Provably Safe AI? 2019-08-28T10:46:32.643Z
Mesa-Optimizers and Over-optimization Failure (Optimizing and Goodhart Effects, Clarifying Thoughts - Part 4) 2019-08-12T08:07:01.769Z
Applying Overoptimization to Selection vs. Control (Optimizing and Goodhart Effects - Clarifying Thoughts, Part 3) 2019-07-28T09:32:25.878Z
What does Optimization Mean, Again? (Optimizing and Goodhart Effects - Clarifying Thoughts, Part 2) 2019-07-28T09:30:29.792Z
Re-introducing Selection vs Control for Optimization (Optimizing and Goodhart Effects - Clarifying Thoughts, Part 1) 2019-07-02T15:36:51.071Z
Schelling Fences versus Marginal Thinking 2019-05-22T10:22:32.213Z
Values Weren't Complex, Once. 2018-11-25T09:17:02.207Z
Oversight of Unsafe Systems via Dynamic Safety Envelopes 2018-11-23T08:37:30.401Z
Collaboration-by-Design versus Emergent Collaboration 2018-11-18T07:22:16.340Z
Multi-Agent Overoptimization, and Embedded Agent World Models 2018-11-08T20:33:00.499Z
Policy Beats Morality 2018-10-17T06:39:40.398Z
(Some?) Possible Multi-Agent Goodhart Interactions 2018-09-22T17:48:22.356Z
Lotuses and Loot Boxes 2018-05-17T00:21:12.583Z
Non-Adversarial Goodhart and AI Risks 2018-03-27T01:39:30.539Z
Evidence as Rhetoric — Normative or Positive? 2017-12-06T17:38:05.033Z
A Short Explanation of Blame and Causation 2017-09-18T17:43:34.571Z
Prescientific Organizational Theory (Ribbonfarm) 2017-02-22T23:00:41.273Z
A Quick Confidence Heuristic; Implicitly Leveraging "The Wisdom of Crowds" 2017-02-10T00:54:41.394Z
Most empirical questions are unresolveable; The good, the bad, and the appropriately under-powered 2017-01-23T20:35:29.054Z
Map:Territory::Uncertainty::Randomness – but that doesn’t matter, value of information does. 2016-01-22T19:12:17.946Z
Meetup : Finding Effective Altruism with Biased Inputs on Options - LA Rationality Weekly Meetup 2016-01-14T05:31:20.472Z
Perceptual Entropy and Frozen Estimates 2015-06-03T19:27:31.074Z
Meetup : Complex problems, limited information, and rationality; How should we make decisions in real life? 2013-10-09T21:44:19.773Z
Meetup : Group Decision Making (the good, the bad, and the confusion of welfare economics) 2013-04-30T16:18:04.955Z

Comments

Comment by Davidmanheim on LessWrong help desk - free paper downloads and more · 2021-04-19T09:12:43.559Z · LW · GW

I also just requested this on reddit

Comment by Davidmanheim on LessWrong help desk - free paper downloads and more · 2021-04-19T09:12:12.362Z · LW · GW

Also just requested on reddit: https://www.reddit.com/r/Scholar/comments/mtwl4d/chapter_k_hoskin_1996_the_awful_idea_of/

Comment by Davidmanheim on LessWrong help desk - free paper downloads and more · 2021-04-19T08:55:46.402Z · LW · GW

Request: "K. Hoskin (1996) The ‘awful idea of accountability’: inscribing people into the measurement of objects. In Accountability: Power , Ethos and the Technologies of Managing, R. Munro and J. Mouritsen (Eds). London, International Thomson Business Press, and references therein."

(Cited by: Strathern, Marilyn (1997). "'Improving ratings': audit in the British University system". European Review. John Wiley & Sons. 5 (3): 305–321. doi:10.1002/(SICI)1234-981X(199707)5:3<05::AID-EURO184>3.0.CO;2-4.)

See Google Books, and Worldcat (Available in many UK universities, incl. Cambridge & Oxford, and in the NYPL, at MIT, etc.) 


Context: Looking for sources about the history of Goodhart's law, esp. as "quoted"/ paraphrased, seemingly by Strathern.

From Strathern's paper:
"When a measure becomes a target, it ceases to be a good measure. The more a 2.1 examination performance becomes an expectation, the poorer it becomes as a discriminator of individual performances. Hoskin describes this as ‘Goodhart’s law’, after the latter’s observation on instruments for monetary control which lead to other devices for monetary flexibility having to be invented. However, targets that seem measurable become enticing tools for improvement."

Comment by Davidmanheim on Wanting to Succeed on Every Metric Presented · 2021-04-14T11:28:10.726Z · LW · GW

Noting the obvious connection to Goodhart's law - and elsewhere I've described the mistake of pushing to maximize easy-to-measure / cognitively available items rather than true goals.

Comment by Davidmanheim on Systematizing Epistemics: Principles for Resolving Forecasts · 2021-04-03T19:23:20.068Z · LW · GW

 Yeah, that's true. I don't recall exactly what I was thinking. 

Perhaps it was regarding time-weighting, and the difficulty of seeing what your score will be based on what you predict - but the Metaculus interface handles this well, modulus early closings, which screw lots of things up. Also, log-scoring is tricky when you have both continuous and binary outcomes, since they don't give similar measures - being well calibrated for binary events isn't "worth" as much, which seems perverse in many ways.

Comment by Davidmanheim on Systematizing Epistemics: Principles for Resolving Forecasts · 2021-04-01T05:41:48.790Z · LW · GW

In many cases, yes. But for some events, the "obvious" answers are not fully clear until well after the event in question takes place - elections, for example.

Comment by Davidmanheim on How many micromorts do you get per UV-index-hour? · 2021-03-31T17:04:21.658Z · LW · GW

About 20% of Americans develop skin cancer during their lifetime, and the 5-year overall survival rate for melanoma is over 90 percent. Taking this as the mortality risk, i.e. ignoring timing and varied risk levels, it's a 2% risk of (eventual) death.

But risk of skin cancer depends on far more than sun exposure - and the more important determinant is frequency of sunbathing below age 30. Other factors that seem to matter are skin color, skin response (how much you burn,) weight, and family history of cancers.
 

Comment by Davidmanheim on Systematizing Epistemics: Principles for Resolving Forecasts · 2021-03-31T14:10:41.430Z · LW · GW

re: "Get this wrong" versus "the balance should be better," there are two different things that are being discussed. The first is about defining individual questions via clear resolution criteria, which I think is doe well, and the second is about defining clear principles that provide context and inform what types of questions and resolution criteria are considered good form.

A question like "will Democrats pass H.R.2280 and receive 51 votes in the Senate" is very well defined, but super-narrow, and easily resolved "incorrectly" if the bill is incorporated into another bill, or if an adapted bill is proposed by a moderate Republican and passes instead, or passed via some other method, or if it passes but gets vetoed by Biden. But it isn't an unclear question, and given the current way that Metaculus is run, would probably be the best way of phrasing the question. Still, it's a sub-par question, given the principles I mentioned. A better one would be "Will a bill such as H.R.2280 limiting or banning straw purchases of firearms be passed by the current Congress and enacted?" It's much less well defined, but the boundaries are very different. It also uses "passed" and "enacted", which have gray areas. At the same time, the failure modes are closer to the ones that we care about near the boundary of the question. However, given the current system, this question is obviously worse - it's harder to resolve, it's more likely to be ambiguous because a bill that does only some of the thing we care about is passed, etc.

Still, I agree that the boundaries here are tricky, and I'd love to think more about how to do this better.

Comment by Davidmanheim on Systematizing Epistemics: Principles for Resolving Forecasts · 2021-03-30T18:37:09.642Z · LW · GW

I haven't said, and I don't think, that the majority of markets and prediction sites get this wrong. I think they navigate this without a clear framework, which I think the post begins providing. And I strongly agree that there isn't a slam-dunk-no-questions case for principles overriding rules, which the intro might have implied too strongly. I also agree with your point about downsides of ambiguity potentially overriding the benefits of greater fidelity to the intent of a question, and brought it up in the post. Still, excessive focus on making rules on the front end, especially for longer-term questions and ones where the contours are unclear, rather than explicitly being adaptive, is not universally helpful. 

And clarifications that need to change the resolution criteria mid-way are due to either bad questions, or badly handled resolutions. At the same time, while there are times that avoiding ambiguity is beneficial, there are also times when explicitly  addressing corner cases to make them unambiguous ("if the data is discontinued or the method is changed, the final value posted using the current method will be used") makes the question worse, rather than better. 

Lastly, I agree that one general point I didn't say, but agree with, was that "where the spirit and letter of a question conflict, the question should be resolved based on the spirit." I mostly didn't make an explicit case for this because I think it's under-specified as a claim. Instead, the three more specific claims I would make are: 
1) When the wording of a question seems ambiguous, the intent should be an overriding reason to choose an interpretation.
2) When the wording of a question is clear, the intent shouldn't change the resolution.

Comment by Davidmanheim on Thirty-three randomly selected bioethics papers · 2021-03-30T05:02:57.319Z · LW · GW

As an aside, I find it bizarre that Economics gets put at 9 - I think a review of what gets done in top econ journals would cause you to update that number down by at least 1. (It's not usually very bad, but it's often mostly useless.) And I think it's clear that lots of Econ does, in fact, have a replication crisis. (But we'll if see that is true as some of the newer replication projects actually come out with results.)

Comment by Davidmanheim on Resolutions to the Challenge of Resolving Forecasts · 2021-03-18T19:36:55.372Z · LW · GW

Generally agree that there's something interesting here, but I'm still skeptical that in most prediction market cases there would be enough money across questions, and enough variance in probabilities, for this to work well.

Comment by Davidmanheim on Resolutions to the Challenge of Resolving Forecasts · 2021-03-18T19:34:47.361Z · LW · GW

For betting markets, the market maker may need to manage the odds differently, and for prediction markets, it's because otherwise you're paying people in lower brier scores for watching the games, rather than being good predictors beforehand. (The way that time-weighted brier scores work is tricky - you could get it right, but in practice it seems that last minute failures to update are fairly heavily penalized.)

Comment by Davidmanheim on Dark Matters · 2021-03-18T19:32:07.890Z · LW · GW

That's good to hear. But if "he started at 60%," that seems to mean if he "still thinks dark matter is overwhelmingly likely" he is updating in the wrong direction. (Perhaps he thought it was 60% likely that the LHC found dark matter? In which case I still think that he should update away from "overwhelmingly likely" - it's weak evidence against the hypothesis, but unless he started out almost certain, "overwhelmingly" seems to go a bit too far.)

Comment by Davidmanheim on Resolutions to the Challenge of Resolving Forecasts · 2021-03-16T18:24:35.971Z · LW · GW

Yes, that was exactly what I was thinking of, but 1) I didn't remember the name, and 2) I wanted a concrete example relevant to prediction markets.

And I agree it's hard to estimate in general, but the problem can still be relevant in many cases - which is why I used my example. In the baseball game, if the market closes before the game begins - we don't have a model as good as the market, but once the game is 7/9th complete, we can do better than the pre-game market prediction.

Comment by Davidmanheim on Resolutions to the Challenge of Resolving Forecasts · 2021-03-16T06:06:18.011Z · LW · GW

It's an interesting idea, but one that seems to have very high costs for forecasters in keeping the predictions updated and coherent.

If we imagine that we pay forecasters the market value of their time, an active forecasting question with a couple dozen people spending a half hour each updating their forecast "costs" thousands of dollars per week. Multiplying that, even when accounting for reduced costs for similar questions, seems not worth the cost.

Comment by Davidmanheim on Dark Matters · 2021-03-15T22:28:08.967Z · LW · GW

"isn't it quite odd that looking around at different parts of the universe seems to produce such a striking level of agreement on how much missing mass there is?"

But they don't. Dark matter, as a theory, posits that the amount of mass that "must be there somewhere" varies in amount and distribution in an ad-hoc fashion to explain the observations. I think it's likely that whatever is wrong with the theory, on the other hand, isn't varying wildly by where in the universe it is. Any such explanation would (need to) be more parsimonious, not less so.

And I agree that physics isn't obligated to make things easy to find - but when the dark matter theory was postulated, they guessed it was a certain type of WIMP, and then kept not finding it. Postulating that it must be there somewhere, and physics doesn't need to make it easy, isn't properly updating against the theory as each successive most likely but still falsifiable guess has been falsified.

Comment by Davidmanheim on Dark Matters · 2021-03-15T09:00:13.913Z · LW · GW

This was fantastic, and still leaves me with a conclusion that "dark matter" isn't a specific hypothesis, it's a set of reasons to think we're missing something in our theories which isn't modified gravity.

That is, saying "Given that everything we see is consistent with Gravity being correct, we conclude that there is not enough baryonic matter to account for what we see," doesn't prove the existence of large amounts of non-baryonic matter. Instead, the evidence provides strong indication that either A) there is something we can't see that has some properties of non-baryonic matter, or B) something is wrong with some theory which isn't what people propose as modified gravity. We knew enough to say that decades ago. We've looked for every type of non-baryonic matter we can think of, and have only been able to eliminate possibilities. The evidence is still pointing to "something else," and we have some actual or claimed physical objects that aren't actually prohibited as an answer - but nothing pointing to them.

This sounds a lot like what we would have said pre-special relativity about Galileo’s relativity (which Newton hated because it didn't allow distinguishing between relative and absolute motion,) and electromagnetism, which didn't seem to follow the rules for indistinguishability of relative and absolute motion, but was much too good of a theory to be wrong. 

Pre-Einstein, we had good reasons to think there's something there that lets relativity and Maxwell be consistent, but don't know what it is. They DID have reason to think the answer wasn't "Maxwell was wrong about the absolute speed limit for light," just like we know the answer isn't "gravity just works differently," but actually plugging the hole required a new conceptual model.

In Bayesian terms, we should have some prior on "gravity + visible mass," some prior on "modified gravity," some prior on "invisible mass that accounts for observation," and some prior on "something else will be found for a theory", and every piece of evidence seems like it's at least as strong evidence for #4 as it is for #3, and our continued lack of success finding the former best candidates for what the invisible mass is made of is better evidence for #4 than #3.

Comment by Davidmanheim on Strong Evidence is Common · 2021-03-15T07:30:09.166Z · LW · GW

"Worth having" is a separate argument about relative value of new information. It is reasonable when markets exist or we are competing in other ways where we can exploit our relative advantage. But there's a different mistake that is possible which I want to note.

Most extreme beliefs are false; for every correct belief, there are many, many extreme beliefs that are false. Strong consensus on some belief is (evidence for the existence of) strong evidence of the truth of that belief, at least among the considered alternatives. So picking a belief on the basis of extremity ("Most sheeple think X, so consider Y") is doing this the wrong way around, because extremity alone is negligible evidence of value. (Prosecutor's fallacy.)

What makes the claim that extremity isn't a useful indicator of value, less valid? That is, where should we think that extreme beliefs should even be considered? 

I think the answer is when the evidence is both novel and cumulatively outweighs the prior consensus, or the belief is new / previously unconsidered. ("We went to the moon to inspect the landing site," not "we watched the same video again and it's clearly fake.") So we should only consider extreme beliefs, even on the basis of our seemingly overwhelming evidence, if the proposed belief is significantly newer than the extant consensus AND we have a strong argument that the evidence is not yet widely shared / understood.

Comment by Davidmanheim on Resolutions to the Challenge of Resolving Forecasts · 2021-03-15T07:10:03.502Z · LW · GW

I think we agree on this - iterated closing is an interesting idea, but I'm not sure it solves a problem. It doesn't help with ambiguity, since we can't find bounds. And earlier payouts are nice, but by the time we can do partial payouts, they are either tiny, because of large ranges, or they are not much before closing. (They also create nasty problems with incentive compatibility, which I'm unsure can be worked out cleanly.)

Comment by Davidmanheim on Resolutions to the Challenge of Resolving Forecasts · 2021-03-14T07:14:10.096Z · LW · GW

"partial resolution seems like it would be useful"
I hadn't thought of this originally, but Nuno added the category of "Resolve with a Probability," which does this. The idea of iterated closing of a question as the bounds improve is neat, but probably technically challenging. (GJ Inc. kind-of does this when they close answer options that are already certain to be wrong, such as total ranges below the current number of CVOID cases.) I'd also worry it creates complexity that makes it much less clear to forecasters how things will work.

"one helpful mechanism might be to pay experts based on agreement with the majority of experts" 
Yes - this has been proposed under the same set of ideas as "meta-forecasts have also been proposed as a way to resolve very long term questions," though I guess it has clearer implications for otherwise ambiguous short term questions. I should probably include it. The key problem in my mind, which isn't necessarily fatal, is that it makes incentive compatibility into a fairly complex game-theoretic issue, with collusion and similar issues being possible.

"keeping evaluation methods secret can help avert Goodhart" 
Yes, I've definitely speculated along those lines. But for the post, I was worried that once I started talking about this as a Goodhart-issue, I would need to explain far more, and be very side-tracked, and it's something I will address more in the next post in any case.

Comment by Davidmanheim on RadVac Commercial Antibody Test Results · 2021-03-12T07:25:42.151Z · LW · GW

Not sure that you'd get reactions from large subunits if they fold differently than the full spike - but my biochemistry/immunology isn't enough to be sure about how this would work.

Comment by Davidmanheim on RadVac Commercial Antibody Test Results · 2021-03-12T07:23:57.603Z · LW · GW

Hence "(recent)"

Comment by Davidmanheim on RadVac Commercial Antibody Test Results · 2021-03-01T08:16:00.553Z · LW · GW

 "Aside from the test result, we do have one more small piece of information to update on: I was quite congested for 1-2 days after the most recent three doses (and I was generally not congested the rest of the week). That's exactly what we'd expect to see if the vaccine is working as intended, and it's pretty strong evidence that it's doing something."


Agree that this is evidence it is doing something, but my strong prior is that the adjuvant alone (chitosan) would cause this to happen. 

I'm also unclear about why you chose the weekly schedule, or if you waited long enough to see any impact. (Not that the RadVac test would tell you anything.) The white paper suggests *at least* one week between doses, and suggests taking 3 doses, for healthy young adults. 

According to the white paper, you're likely to be protected, and I think continuing now would add danger without corresponding benefit. You said in the original post that you might continue dosing. I don't know enough to comment usefully about either immune tolerance or adjuvant hyperstimulation, but I suggest talking to a immunologist about those risks and how they change if you in fact continue and try "more dakka," since continuing to does seems like it would increase those risks.

Strongly agree that ELISA tests are more valuable than more RadVac, and it would be at least moderate evidence one way or another. (But even if you can induce immune reactions to parts of the virus, it' unclear how much that would actually reduce your risk if infected.)

Comment by Davidmanheim on RadVac Commercial Antibody Test Results · 2021-03-01T07:56:47.265Z · LW · GW

I agree that posting the results was the correct thing to do, and appreciate that John is trying to figure out if this is useful - but I actually claim the post is an example of how rationality is hard, and even pursuing it can be misleading if you aren't very, very careful.

In The Twelve Virtues of Rationality, this post gets virtue points for the first (curiosity, for looking into whether it works,) third (lightness, being willing to update marginally on evidence,) fourth (evenness, updating even when the evidence isn't in the direction desired,) sixth (empiricism, actually testing something,) and tenth (precision, specifying what was expected.) But virtue is certainly not a guarantee of success, even for completely virtuous approaches. 

I think this tries to interpret data correctly, but falls short on the eleventh virtue, scholarship. For those who want to do Actual Science™, the first step is to know about the domain, and make sure your experiment is valid and useful. Going out and interacting with reality is valuable once your models are good enough to be able to interpret evidence. But Science is hard. (Perhaps not as hard as rationality, at least in some ways, but still very, very difficult.) In this case, without checking what the specific target being tested for was, as Christian notes, the data doesn't actually provide useful evidence. And if he had a (recent) asymptomatic case of COVID, the result would have been positive, which is evidence that the vaccine doesn't work, but would have been interpreted as evidence that it did.

Comment by Davidmanheim on RadVac Commercial Antibody Test Results · 2021-03-01T07:38:15.230Z · LW · GW

You need to see if the spike peptide included corresponds to the antibody being tested for - and given how many targets there are, I would be surprised if it did.

Despite holding a far lower prior on efficacy, I'm agreeing with Christian - this evidence shouldn't be a reason to update anywhere nearly as strongly as you did against effectiveness.

Comment by Davidmanheim on Making Vaccine · 2021-02-28T09:49:01.126Z · LW · GW

Mostly vague "accidents and harmful unknown unknowns aren't that unlikely here" - because we have data on baseline success at "not have harmful side effects," and it is low. We also know that lots of important side effects are unusual, so the expected loss can be high even after a number of "successes," and this is doubly true because no-one is actually tracking side effects. We don't know much about efficacy either, but again, on base rates it is somewhat low. (Base rates for mRNA are less clear, and may be far higher - but these sequences are unfiltered, so I'm not sure even those bse rates would apply.) 

Finally, getting the adjuvants to work is typically tricky for vaccines, and I'd be very concerned about making them useless, or inducing reactions to something other than the virus. But if you want to know about intentional misuse, it's relatively low. I would wonder about peanut protein to induce you to develop a new allergy because you primed your immune system to react to a new substance, but you'd need someone more expert than I.

Overall, I'd be really happy taking bets that in 20 years, looking back with (hopefully) much greater understanding of mRNA vaccines, a majority of immunologists would respond to hearing details about this idea with a solid "that's idiotic, what the hell were those idiots thinking?" (If anyone wants to arrange details of this bet, let me know - it sounds like a great way to diversify and boost my expected retirement returns.)

Comment by Davidmanheim on How my school gamed the stats · 2021-02-24T12:21:25.178Z · LW · GW

Sorry, this is clearly much more confrontational than I intended.

Comment by Davidmanheim on How my school gamed the stats · 2021-02-24T08:21:12.732Z · LW · GW

First, I apologize. I really didn't intend for the tone to be attacking, and I am sorry that was how it sounded. I certainly wasn't intentionally "suggesting [you were] somehow trying to hide or deny" any of the issues. I thought it was worth noting that the initial characterization was plausibly misleading, given that the sole indicator of being a "nice middle class area" seemed to be percentage of people with PhDs. Your defense was that it was no more than 3x the number of PhDs, but that doesn't mean top 1/3, a point which you later agreed to. And after further discussion, I made and you checked an object level prediction I made, so I ceded the point.

Despite ceding the main earlier point, I continued the discussion, since I think the terms and definitions have gotten very confused by citing various incompatible sources and citing isolated sections of articles. And the same way that I have picked specific things to focus on responding to, you have picked many things I have said which you ignore. That's fine - but my responses were not an isolated demand for rigor; I have made concrete claims and acknowledged those which were refuted, and you have raised points which I have responded to. 

So again, I am not disputing your neighborhood, which I conceded I initially thought was more affluent than it is. Despite that, there is plenty you have now said characterizing classes, in responses, which I think should be clarified if you want to continue. Again, this doesn't reflect on your earlier claim about you child's school, but your defense of the position has been confusing to me, at least. For example, you compare $166,000/year in the US, which is the top 15% there, to incomes in your neighborhood - then note that £80k (i.e. $110k) in the UK is the top 5%. You don't say anything about the equivalent income in the UK. I again agree that your neighborhood is not in the top 15%, but the top 15% there in the UK is £46k. (not $166k, i.e. £118k) The actual average income in your area, £36k, is in the top 25%. (I would suspect the incomes for those with children in the area is higher, but again, not near the top of upper middle class) Finally, the specific examples of upper middle class that you cite - Cameron, etc. are discussing their family backgrounds, not their current status.

Comment by Davidmanheim on How my school gamed the stats · 2021-02-23T19:27:16.846Z · LW · GW

Wait, the claim was never that everyone is well off - of course we expect there to be a distribution. But if a sizeable portion of the children at the school largely have very high-socioeconomic-status parents, even if it's only 10% of the parents, that should be compared to a median of plausibly less than 1% of parents in the set of schools overall, it would be incorrect to infer that the way the school is run can be usefully compared to the "average" school.

Comment by Davidmanheim on The slopes to common sense · 2021-02-23T19:22:06.755Z · LW · GW

Great post. 

My only comment is that I think you're confused in section iv when you say, "but the origin of the universe is essentially an infinity of inferential steps away given the sheer scale of the issue," and think that you're misunderstanding some tricky and subtle points about epistemology of science and what inferential steps would be needed. So people might be right when they say you meant "We can't make any meaningful factual claims about the origin of the universe. We are too limited to understand an event like this." - but the object level claim is wrong. In fact, we can make empirical predictions that get falsified by evidence in Cosmology, and the predictions about the big bang, the cosmic microwave background, and uniformity did exactly that.

Comment by Davidmanheim on How my school gamed the stats · 2021-02-23T13:13:44.034Z · LW · GW

That's fair - thanks for checking, and I'd agree that that would better match "very nice middle-class area" than my assertion. (In the US, the top 2-3% is usually considered upper class, while the next 15-20% are upper middle class, and the next ~25% are "lower middle class." This income level definitely puts your neighborhood in the middle of the upper middle class.)

Comment by Davidmanheim on How my school gamed the stats · 2021-02-23T08:48:50.912Z · LW · GW

I'd agree with most of your models, and agree that there is divergence at the extremes of a distribution - but that's at the very extremes, and usually doesn't lead to strong anti-correlation even in the extreme tails. 

But I think we're better off being more concrete. I don't know where you live, but I suspect that your postal code is around the 90% income percentile, after housing costs - a prediction which you can check easily. And that implies that the tails for income and education are still pretty well correlated at only the 97th percentile for education - and implying the same about status more generally. (Or perhaps you think the people who attend the school are significantly less rich than the average in the area?)

Comment by Davidmanheim on How my school gamed the stats · 2021-02-22T11:59:46.245Z · LW · GW

Even given your numbers, I think it's very likely  that you're underestimating how privileged the group is. Most things like educational status are pareto-distributed; 80% of PhDs are in 20% of areas. While that assumption may be unfair, if it were correct, the point with 3x the average is in the 97th percentile.

And yes, you're near Cambridge, which explains the concentration of PhDs, and makes it seem less elite compared to Cambridge itself, but doesn't change the class of the people compared to the country as a whole.

Comment by Davidmanheim on How my school gamed the stats · 2021-02-21T15:22:47.799Z · LW · GW

Note that only around 3% of UK residents have PhDs - so I strongly suspect that what you're calling "middle-class" is closer to the top 5% of the population, or what sociologists would say is the very upper part of the upper middle class. 

Comment by Davidmanheim on Promoting Prediction Markets With Meaningless Internet-Point Badges · 2021-02-12T14:07:41.504Z · LW · GW

Yes, it's super important to update frequently when the scores are computed as time-weighted. And for Mataculus, that's a useful thing, since viewers want to know what the current best guess is, but it's not the only way to do scoring. But saying frequent updating makes you better at forecasting isn't actually a fact about how accurate the individual forecasts are - it's a fact about how they are scored.

Comment by Davidmanheim on Making Vaccine · 2021-02-12T14:02:52.785Z · LW · GW

"Immunity" and "efficacy" seem like they should refer to the same thing, but they really don't. And if you talk to people at the FDA, or CDC, they should, and probably would, talk about efficacy, not immunity, when talking about these vaccines.

And I understand that the technical terms and usage aren't the same as what people understand, and I was trying to  point out that for technical usage, the terms don't quite mean the things you were assuming. 

And yes, the vaccines have not been proven to provide immunizing protection - which again, is different than efficacy. (But the vaccines do almost certainly provide immunizing protection for some people, just based on the obvious prior information and the current data - though it's unclear how well they do so, at how long after the vaccine.) 

And, to make things worse, even efficacy is unclearly defined. It gets defined in each clinical trial  - differently for each drug/vaccine/etc. and I don't think it actually mean the same thing for the currently approved COVID-19 vaccines. It's pretty similar, stopping symptomatic cases, but even given the same endpoint, it's not necessarily identical, since the studies picked how to measure the endpoints independently, and differently.

Comment by Davidmanheim on Covid 2/11: As Expected · 2021-02-12T13:51:45.341Z · LW · GW

There was a lesswrong post about this a while back that I can't find right now, and I wrote a twitter thread on a related topic. I'm not involved with the reasoning behind the structure for GJP or Metaculus, so for both it's an outside perspective. However, I was recently told there is a significant amount of ongoing internal metaculus discussion about the scoring rule, which, I think, isn't nearly as bad as it seemed. (But even if there is a better solution, changing the rule now would have really weird impacts on motivation of current users, which is critical to the overall forecast accuracy, and I'm not sure it's worthwhile for them.)

Given all of that, I'd be happy to chat, or even do a meetup on incentives for metrics and issues generally, but I'm not sure I have time to put together my thoughts more clearly in the next month. But I'd think Ozzie Gooen has even more to usefully say on the topic. (Thinking about it, I'd be really interested in being on or watching a panel discussion of the topic - which would probably make an interesting event.)

Comment by Davidmanheim on Covid 2/11: As Expected · 2021-02-12T13:43:11.755Z · LW · GW

If the user is interested in getting into the top ranks, this strategy won't be anything like enough. And if not, but they want to maximize their score, the scoring system is still incentive compatible - they are better off reporting their true estimate on any given question. And for the worst (but still self-aware) predictors, this should be the metaculus prediction anyways - so they can still come away with a positive number of points, but not many. Anything much worse than that, yes, people could have negative overall scores - which, if they've predicted on a decent number of questions, is pretty strong evidence that they really suck at forecasting.

Comment by Davidmanheim on Covid 2/11: As Expected · 2021-02-12T13:38:02.862Z · LW · GW

Not really. Overall usefulness is really about something like covariance with the overall prediction - are you contributing different ideas and models. That would be very hard to measure, while making the points incentive compatible is not nearly as hard to do.

And how well an individual predictor will do, based on historical evidence, is found in comparing their brier to the metaculus prediction on the same set of questions. This is information which users can see on their own page. But it's not a useful figure unless you're asking about relative performance, which as an outsider interpreting predictions, you shouldn't care about - because you want the aggregated prediction.

Comment by Davidmanheim on Covid 2/11: As Expected · 2021-02-12T13:32:43.540Z · LW · GW

I agree that actually offering money would require incentives to avoid, essentially, sybil attacks. But making sure people don't make "noise predictions" isn't a useful goal - those noise predictions don't really affect the overall metaculus prediction much, since it weights past accuracy.

 

Comment by Davidmanheim on Covid 2/11: As Expected · 2021-02-12T11:44:26.207Z · LW · GW

As someone who is involved in both Metaculus and the Good judgement project, I think it's worth noting that Zvi's criticism of Metaculus - that points are given just for participating, so that making a community average guess gets you points - applies to Good Judgement Inc's predictions by superforecasters in almost exactly the same way - the superforecasters are paid for a combination of participation and their performance, so that guessing the forecast median earns them money. (GJI does have a payment system for superforecasters which is more complex than this, and I probably am not allowed to talk about - but the central point remains true.)

Comment by Davidmanheim on Covid 2/11: As Expected · 2021-02-12T11:30:56.245Z · LW · GW

I think that viewing it as a competition to place highly on the leaderboards is misleading, and perhaps even damaging.

I'd think the better framing for metaculus points is that they are like money - you are being paid to predict, on net, and getting more money is better. The fact that the leaderboard has someone with a billion points, because they have been participating for years, is kind-of irrelevant, and misleading.

In fact, I'd like to see metaculus points actually be convertible to money at some point in some form - and yes, this would require a net cost (in dollars) to post a new question, and have the pot of money divided proportionate to the total points gained on the question - with negative points coming out of a users' balance. (And this would do a far better job aligning incentives on questions than the current leaderboard system, since for a leaderboard system, proper scoring rules for points are not actually incentive compatible.)

Comment by Davidmanheim on Covid 2/11: As Expected · 2021-02-12T11:24:09.211Z · LW · GW

What are you hoping to "win"? This isn't a market - you don't need your relative performance to be better than someone else's to have done well. And giving people points for guessing the community prediction is valuable, since it provides evidence that they don't have marginal information that causes them to believe something different. If people only predict when they are convinced they know significantly more than others, there would be far fewer predictions.

Comment by Davidmanheim on Covid 2/11: As Expected · 2021-02-12T11:20:17.499Z · LW · GW

You don't care, but if the goal is to motivate better communal predictions, giving people the incentive to do more predicting seems to make far more sense than having it normed to sum to zero, which would mean that in expectation you only gain points when you outperform the community.

Comment by Davidmanheim on Making Vaccine · 2021-02-11T13:41:56.224Z · LW · GW

In epidemiology / medicine, etc. "Immunizes" has a technical meaning - it means you cannot contract or carry the disease. (i.e. not that you don't get symptoms.)

Comment by Davidmanheim on The Upper Limit of Value · 2021-02-10T09:42:46.113Z · LW · GW

This seems confused in a bunch of ways, but I'm not enough of an expert in quantum mechanics, chaos theory, or teaching to figure out where you're confused. Anders might be able to help - but I think we'd need a far longer discussion to respond and explain this.

But to appeal to authority, when Scott Aaronson looked at the earlier draft, he didn't bring up any issues with quantum uncertainty as a concern, and when I check back in with him, I'll double check that he doesn't have any issues with the physics.

Comment by Davidmanheim on Promoting Prediction Markets With Meaningless Internet-Point Badges · 2021-02-09T19:46:18.526Z · LW · GW

Good to see so many of us moderately good forecasters are agreeing - now we just need to average then extremize the forecast of how good an idea this is. ;)

Comment by Davidmanheim on Promoting Prediction Markets With Meaningless Internet-Point Badges · 2021-02-09T17:18:27.591Z · LW · GW

For scoring systems, rather than betting markets, none of these particular attacks work.  This is trivially true for the first and third attack, since you don't be against individuals. And for any proper scoring rule, calibration-fluffing is worse than predicting your true odds for the dumb predictions. (Aligning incentives is still very tricky, but the set of attacks are very different.)

Comment by Davidmanheim on Making Vaccine · 2021-02-09T06:12:43.093Z · LW · GW

First, base rates are critical. Looking at potential drugs overall, the rate of approvals due to safety alone - i.e. "Investigational New Drugs" to phase-II efficacy trials, is very low. Phase 1 trials are typically 80-100 people, and most don't manage to make it past that stage. It would take much stronger evidence than I have seen to think that this vaccine is going to be outside of the norm.

Second, even if the process as done was safe, I can't imagine that greater than 99% of people manage to do this without screwing up in some serious way. That's less true of the LW crowd, but I don't think people are aware of how dumb the mistakes that get made are, or how much quality control matters, and how difficult it is with trying to enforce it for DIY projects.

Lastly, I'm well within the consensus for almost all the rest of the questions - I think it probably works in most cases, and I think it will have side effects in far fewer than 50% of cases.

(But another place I'm a bit outside the consensus is that I think it's unlikely to trigger standard antibody tests, since standard antibody tests are looking for antibodies against a specific part of the virus, and I'm unsure, reading the "Antibodies and B-cell immune response" section of the white paper, that standard tests would detect the elicited types of NABs.)

Comment by Davidmanheim on Promoting Prediction Markets With Meaningless Internet-Point Badges · 2021-02-08T20:54:46.804Z · LW · GW

There's a close analogue, which is getting accepted as a superforecaster by the Good Judgement Project by performing in the top 1%, I beleive, on Good Judgement Open. (They have badges of some sort as well for superforecasters.) I'll also note that the top-X metaculus score is a weird and not great metric to try to get people to maximize, because it rewards participation as well as accuracy - for example, you can get tons of points by just always guessing the metaculus average, and updating frequently - though you'll never overtake the top people. And contra ike, as a rank 50-100 "metaculuser" who doesn't have time to predict on everything and get my score higher, I think we should privilege that distinction over all the people who rank higher than me on metaculus. ;)

I will say that I think there's already a reasonable amount of prestige in certain circles for being a superforecaster, especially in EA- and LW-adjacent areas, though it's hard for me to disentangle how much prestige is from that versus other things I have been doing around the same time, like getting a PhD.