Posts

Maybe Antivirals aren’t a Useful Priority for Pandemics? 2021-06-20T10:04:08.425Z
A Cruciverbalist’s Introduction to Bayesian reasoning 2021-04-04T08:50:07.729Z
Systematizing Epistemics: Principles for Resolving Forecasts 2021-03-29T20:46:06.923Z
Resolutions to the Challenge of Resolving Forecasts 2021-03-11T19:08:16.290Z
The Upper Limit of Value 2021-01-27T14:13:09.510Z
Multitudinous outside views 2020-08-18T06:21:47.566Z
Update more slowly! 2020-07-13T07:10:50.164Z
A Personal (Interim) COVID-19 Postmortem 2020-06-25T18:10:40.885Z
Market-shaping approaches to accelerate COVID-19 response: a role for option-based guarantees? 2020-04-27T22:43:26.034Z
Potential High-Leverage and Inexpensive Mitigations (which are still feasible) for Pandemics 2020-03-09T06:59:19.610Z
Ineffective Response to COVID-19 and Risk Compensation 2020-03-08T09:21:55.888Z
Link: Does the following seem like a reasonable brief summary of the key disagreements regarding AI risk? 2019-12-26T20:14:52.509Z
Updating a Complex Mental Model - An Applied Election Odds Example 2019-11-28T09:29:56.753Z
Theater Tickets, Sleeping Pills, and the Idiosyncrasies of Delegated Risk Management 2019-10-30T10:33:16.240Z
Divergence on Evidence Due to Differing Priors - A Political Case Study 2019-09-16T11:01:11.341Z
Hackable Rewards as a Safety Valve? 2019-09-10T10:33:40.238Z
What Programming Language Characteristics Would Allow Provably Safe AI? 2019-08-28T10:46:32.643Z
Mesa-Optimizers and Over-optimization Failure (Optimizing and Goodhart Effects, Clarifying Thoughts - Part 4) 2019-08-12T08:07:01.769Z
Applying Overoptimization to Selection vs. Control (Optimizing and Goodhart Effects - Clarifying Thoughts, Part 3) 2019-07-28T09:32:25.878Z
What does Optimization Mean, Again? (Optimizing and Goodhart Effects - Clarifying Thoughts, Part 2) 2019-07-28T09:30:29.792Z
Re-introducing Selection vs Control for Optimization (Optimizing and Goodhart Effects - Clarifying Thoughts, Part 1) 2019-07-02T15:36:51.071Z
Schelling Fences versus Marginal Thinking 2019-05-22T10:22:32.213Z
Values Weren't Complex, Once. 2018-11-25T09:17:02.207Z
Oversight of Unsafe Systems via Dynamic Safety Envelopes 2018-11-23T08:37:30.401Z
Collaboration-by-Design versus Emergent Collaboration 2018-11-18T07:22:16.340Z
Multi-Agent Overoptimization, and Embedded Agent World Models 2018-11-08T20:33:00.499Z
Policy Beats Morality 2018-10-17T06:39:40.398Z
(Some?) Possible Multi-Agent Goodhart Interactions 2018-09-22T17:48:22.356Z
Lotuses and Loot Boxes 2018-05-17T00:21:12.583Z
Non-Adversarial Goodhart and AI Risks 2018-03-27T01:39:30.539Z
Evidence as Rhetoric — Normative or Positive? 2017-12-06T17:38:05.033Z
A Short Explanation of Blame and Causation 2017-09-18T17:43:34.571Z
Prescientific Organizational Theory (Ribbonfarm) 2017-02-22T23:00:41.273Z
A Quick Confidence Heuristic; Implicitly Leveraging "The Wisdom of Crowds" 2017-02-10T00:54:41.394Z
Most empirical questions are unresolveable; The good, the bad, and the appropriately under-powered 2017-01-23T20:35:29.054Z
Map:Territory::Uncertainty::Randomness – but that doesn’t matter, value of information does. 2016-01-22T19:12:17.946Z
Meetup : Finding Effective Altruism with Biased Inputs on Options - LA Rationality Weekly Meetup 2016-01-14T05:31:20.472Z
Perceptual Entropy and Frozen Estimates 2015-06-03T19:27:31.074Z
Meetup : Complex problems, limited information, and rationality; How should we make decisions in real life? 2013-10-09T21:44:19.773Z
Meetup : Group Decision Making (the good, the bad, and the confusion of welfare economics) 2013-04-30T16:18:04.955Z

Comments

Comment by Davidmanheim on The Best Textbooks on Every Subject · 2021-07-06T05:44:44.564Z · LW · GW

This is unfortunately defunct, replaced by another site on a different topic.

Comment by Davidmanheim on We Still Don't Know If Masks Work · 2021-07-05T09:22:45.063Z · LW · GW

It's not true that the only protection lost would be from asymptomatic people, though that would still be a big deal if a quarter of cases are asymptomatic and R is above 4, which it likely is in a population taking no other precautions. And even without masks, people who actively feel very sick often seek treatment and are diagnosed, and when not, mostly aren't going out in public much. But there are two other groups that matter;

  1. Presymptomatic spread is a big deal for COVID, and accounts for much of why it spreads quickly. That's why we saw such short serial transmission intervals. And if you don't eliminate the rapid spread, you're not getting much benefit from masks.
  2. Paucisymptomatic people, who have a slight runny nose or temperature and nothing else, are fairly common, might not notice, or will assume it's not COVID, since it's mild, and spread the virus. (And this category partly overlaps with the previous one - people often start manifesting minor symptoms before they notice all of them.)
Comment by Davidmanheim on We Still Don't Know If Masks Work · 2021-07-05T09:16:23.112Z · LW · GW

Agreement reached!

Comment by Davidmanheim on We Still Don't Know If Masks Work · 2021-07-05T08:54:46.909Z · LW · GW

Mechanistic explanations are good for priors, but don't replace, much less refute, empirical evidence. If we see that there is ~0% impact in RCTs, the fact that we know better because it "must" decrease transmission isn't relevant. And the mechanistic model could be wrong in many ways. For example, maybe people wear masks too poorly to matter (they do!) Maybe masks only help if people never take them off to blow their nose, or scratch it, or similar (they do all those things, too.)

And we see that even according to the paper, the impact is pretty small, so it mostly refutes the claim made by the mechanistic model you propose - the impact just isn't that big, for various reasons. Which would imply that there is no way to know if it's materially above 0%.

In fact, I think it's likely that the impact is non-trivial in reducing transmission, but the OP is right that we don't have strong evidence.

Comment by Davidmanheim on Review of "Lifecycle Investing" · 2021-06-25T11:33:25.144Z · LW · GW

Based on this approach, optimal allocation for equities for younger folks is probably well over 100% - and this isn't particularly complicated to do, contra the statements in the article. Long dated out-of-the-money stock index options are a viable retirement investment. I'd tell people to seriously consider buying out of the money calls. As an illustrative example, a 120% of future price once a year for 2-3 years away with, say, 5% of your portfolio. 

BUT - warning to readers: If you don't know / understand the argument I'm making, please don't just go buy stock options. Certainly don't spend more than a small portion of your long-term savings on them!

Comment by Davidmanheim on Maybe Antivirals aren’t a Useful Priority for Pandemics? · 2021-06-23T17:59:09.812Z · LW · GW

I think we agree - I'm certainly in favor of massive investments in surveillance and in PPE. The key question was whether I was missing something in the push for vaccines and antivirals, as if both were similarly promising.

Comment by Davidmanheim on Maybe Antivirals aren’t a Useful Priority for Pandemics? · 2021-06-23T17:56:49.936Z · LW · GW

I guess I could have cited more data on the claim that antivirals work poorly - but I wasn't trying to write an academic paper, and I don't think you cited anything that refutes my point. 

You seem unconvinced about how much this generalizes, so in addition to the obvious relative lack of efficacy for HIV, noted earlier, it might be somewhat useful to note that, AFAIK, the entire set of viral diseases we have antivirals for is HIV, HPV, Flu, Hep-B and C, and various herpesviruses (HCMV,  HSV, VZV,) and that most of these (HIV, Hepatitis, and the herpesviruses,) seem to be  used mainly to treat chronic disease by reducing viral load, rather than cure the disease, and the the remainder aren't particularly effective as cures.

Some, in fact, only seem to work in studies funded by their manufacturers. You, and others, claim that Neuraminidase inhibitors like tamiflu seem to work. Some people, like the people who wrote the Cochrane review, disagree. That's fine - evidently you know lots about this, and I only looked into it briefly, though the evidence seems at best shaky to me. And I'm not going to try to convince you, or write a paper on this. But I was asking for feedback and corrections, so thanks.

Comment by Davidmanheim on Maybe Antivirals aren’t a Useful Priority for Pandemics? · 2021-06-23T14:49:35.081Z · LW · GW

I'm not defending any institutions, or disagreeing with the point. But I mostly agree with your substantive claim, and I'm happy to talk about the question more  - elsewhere.

I'm simply telling you it's off topic. As the commenting guidelines should have made clear by now.
 

Comment by Davidmanheim on Maybe Antivirals aren’t a Useful Priority for Pandemics? · 2021-06-21T06:53:58.675Z · LW · GW

First, you don't want to assume the worst case and then plan for only that, or you won't have prepared for the less bad cases. You are advising not bothering to develop treatments and prophylactics because in the worst case they won't work. That seems obviously wrong, and not worth discussing.

Second, yes, we need surveillance and PPE, but these don't relate to my questions. And if we're concerned about bioengineered pandemics, the bioengineering will explicitly attempt to build around the known countermeasures, so I'm not sure why the first and second paragraphs are combined into a single comment.

Comment by Davidmanheim on Maybe Antivirals aren’t a Useful Priority for Pandemics? · 2021-06-21T06:50:26.808Z · LW · GW

There are a lot of points here, many of which I agree with, several of which I don't, but none seem to address the questions I asked or points I made.

To very briefly respond, 

First, yes, warning is critical, and being discussed, but doesn't relate to the 100 day plan, which was formulated in case there is spread, i.e. warning systems failed.

Second, the plan expressly addresses the issues with slow clinical trials, and building new institutions to handle that.

And third, laboratory origins and the database deletion is so far off from the point I was thinking about deleting the comment.

Comment by Davidmanheim on Avoid News, Part 2: What the Stock Market Taught Me about News · 2021-06-15T07:52:19.027Z · LW · GW

As another example of the stock market showing that most news is garbage, there's a story I've told before from when I was working in finance. News reporters saw stock prices in retail drop in mid-2007, led by one particular large company, and they built a story around why. "Retail expectations," and similar post-hoc explanations led the headlines in all the financial outlets. 

Having seen exactly what happened, I can confidently say that the actual story was that a large hedge fund's options desk rolled the expiry of their long-dated call position (from 2008 to 2009, IIRC,) and when all the counterparties rebalanced their delta hedges, it pushed prices down, which was then taken as a signal about other retailers, whos prices also dropped a bit. If you looked at the volumes and prices over the 10-minutes after we got the call asking to change the position, and the next ~2 hours for other retailers, it was really clear.

Comment by Davidmanheim on We need a standard set of community advice for how to financially prepare for AGI · 2021-06-08T12:01:51.700Z · LW · GW

The question is not if you can build a portfolio where the expected gains conditional on AGI is positive, it's whether you can get enough of an advantage that it outweighs the costs of doing so, and in expectation outperforms the obvious alternative strategy of index funds. If you're purely risk-neutral, this is somewhat easier. Otherwise, the portfolio benefits of reducing probability of losses are hard to beat.

You also may have cases where P(stock rises | AGI by date X)>>P(Stock rises), but P(stock falls | ~AGI  by date X) is high enough not to be worthwhile.

Comment by Davidmanheim on We need a standard set of community advice for how to financially prepare for AGI · 2021-06-07T11:53:34.924Z · LW · GW

We have so much uncertainty abut pathways that I'm skeptical there is really any benefit here. If we knew enough to write such a guide, that would be great, but for reasons having nothing to do with our financial preparedness.

Comment by Davidmanheim on Search-in-Territory vs Search-in-Map · 2021-06-06T10:52:16.534Z · LW · GW

Note: I think that this is a better written-version of what I was discussing when I revisited selection versus control, here: https://www.lesswrong.com/posts/BEMvcaeixt3uEqyBk/what-does-optimization-mean-again-optimizing-and-goodhart (The other posts in that series seem relevant.)

I didn't think about the structure that search-in territory / model-based optimization allows, but in those posts I mention that most optimization iterates back and forth between search-in-model and search-in-territory, and that a key feature which I think you're ignoring here is cost of samples / iteration. 

Comment by Davidmanheim on Scott Alexander 2021 Predictions: Buy/Sell/Hold · 2021-05-09T11:07:31.728Z · LW · GW

Further update: Happy I bought $100 at $0.31. Even happier I bought more at $0.17 when the market dropped as Netanyahu didn't form a coalition. Of course, Bennett/Lapid might still pull this off, but the market is now at $0.42.

Comment by Davidmanheim on Small and Vulnerable · 2021-05-04T12:03:48.034Z · LW · GW

Someone with no personal experience of suffering should also be moved by that consideration.

That sounds like a fantastic reason for someone with that experience to post it, as occurred here, as a way to explain what it is like to others.

In fact, only the existence of suffering for some concrete individual justifies the abstract conclusion of altruism. Without that concrete level, the abstraction is hypothetical, and should not provide the same level of reason to be altruistic.

Comment by Davidmanheim on Scott Alexander 2021 Predictions: Buy/Sell/Hold · 2021-05-03T12:29:24.206Z · LW · GW

Update: we did this, I bought shares, we'll see how it goes.

Comment by Davidmanheim on Strong Evidence is Common · 2021-05-03T06:59:42.222Z · LW · GW

Extreme, in this context, was implying far from the consensus expectation. That implies both "seen as radical" and "involving very high [consensus] confidence [against the belief]." 

Contra your first paragraph, I think, I claim that this "extremeness" is valid Bayesian evidence for it being false, in the sense that you identify in your third paragraph - it has low prior odds. Given that, I agree that it would be incorrect to double-count the evidence of being extreme. But my claim was that, holding "extremeness" constant, the newness of a claim was independent reason to consider it as otherwise more worthy of examination, (rather than as more likely,) since VoI was higher / the consensus against it is less informative. And that's why it doesn't create a loop in the way you suggested. 

So I wasn't clear in my explanation, and thanks for trying to clarify what I meant. I hope this explains better / refined my thinking to a point where it doesn't have the problem you identified - but if I'm still not understanding your criticism, feel free to try again.

Comment by Davidmanheim on Scott Alexander 2021 Predictions: Buy/Sell/Hold · 2021-04-29T09:13:50.985Z · LW · GW

Want to sell me USDC on there in exchange for paypal, so I can bet? (I'll gladly pay a 2% "commission" for, say, $200 in USDC.)

Comment by Davidmanheim on Scott Alexander 2021 Predictions: Buy/Sell/Hold · 2021-04-29T09:06:43.550Z · LW · GW

It's a pain to redo, but can someone add Ought embedded predictions to all of these?

https://forecast.elicit.org/binary

(Alternatively/additionally, can they all be on Metaculus?)

Comment by Davidmanheim on What topics are on Dath Ilan's civics exam? · 2021-04-29T06:14:18.511Z · LW · GW

Relatedly and perhaps even more fundamentally, the basic discipline of thinking about a system and implementing a mathematical model or simulation to explore these topics, which drove the insights you mention. And in many ways, it's easier to test without worrying about people gaming the system, because you can give new examples and require them to actually explore the question.

Comment by Davidmanheim on What topics are on Dath Ilan's civics exam? · 2021-04-29T06:11:49.186Z · LW · GW

That's fine, but choosing the question set to give the self-motivated children on which you provide the instant computer driven feedback is the same type of question; what is it that we want the child interested in X to learn?
 

Concretely, my 8 year old son likes math. He's fine with multiplication and division, but enjoys thinking about math. If I want him to be successful applying math later in life, should I start him on knot theory, pre-algebra equation solving, adding and subtracting unlike fractions, or coding in python? I see real advantages to each of these; proof-based thinking and abstraction from concrete to theoretical ideas, more abstract thinking about and manipulation of numbers, getting ahead of what he'll need next to continue at the math he will need to learn, or giving him other tools that will expand his ability to think and apply ideas, respectively.

I'd love feedback about which of these (or which combination of these) is most likely to ensure he's learning the things that are useful in helping him apply math in a decade, but I can't get useful feedback without trying it on large samples over the course of decades. Or, since I don't live in Dath Ilan, I can use my best judgement and ask others for feedback in an ad-hoc fashion.

Comment by Davidmanheim on What topics are on Dath Ilan's civics exam? · 2021-04-28T14:57:34.139Z · LW · GW

Partly agree with your criticism of the quoted claim, but there are two things I think you should consider.

First, evaluating tests for long-term outcomes is fundamentally hard. The extent to which a 5th grade civics or math test predicts performance in policy or engineering is negligible. In fact, I would expect that the feedback from test scores in determining what a child focuses on has a far larger impact on a child's trajectory than the object level prediction allows.

Second, standardizing tests greatly reduces cost of development, and allows larger sample sizes for validation. For either reason alone, it makes sense to use standardized tests as much as possible.

Comment by Davidmanheim on Scott Alexander 2021 Predictions: Buy/Sell/Hold · 2021-04-28T10:34:36.042Z · LW · GW

12. Netanyahu is still Israeli PM: 40%

This is the PredictIt line for him on 6/30, and Scott’s predicting this out to January 1. I’m guessing that he didn’t notice? Otherwise, given how many things can go wrong, it’s a rather large disagreement – those wacky Israelis have elections constantly. I’m going to sell this down to 30% even though I have system 1 intuitions he’s not going anywhere. Math is math. 

 

I would buy at this price, probably up to 50%, but there are some wrinkles to how it gets resolved. At least 45% of the population really really wants him as PM, and the other 55% doesn't have a favorite, but 2/3rds are very strongly opposed to Netanyahu. If he is temporarily no longer PM due to the sharing agreement during the run-up to another election, but then wins, does that resolve yes, or no? (This seems remarkably plausible.)

But as a disclaimer, I'm bad (and somewhat poorly calibrated) at predicting things I'm emotionally invested in, and, well...

Comment by Davidmanheim on LessWrong help desk - free paper downloads and more · 2021-04-19T09:12:43.559Z · LW · GW

I also just requested this on reddit

Comment by Davidmanheim on LessWrong help desk - free paper downloads and more · 2021-04-19T09:12:12.362Z · LW · GW

Also just requested on reddit: https://www.reddit.com/r/Scholar/comments/mtwl4d/chapter_k_hoskin_1996_the_awful_idea_of/

Comment by Davidmanheim on LessWrong help desk - free paper downloads and more · 2021-04-19T08:55:46.402Z · LW · GW

Request: "K. Hoskin (1996) The ‘awful idea of accountability’: inscribing people into the measurement of objects. In Accountability: Power , Ethos and the Technologies of Managing, R. Munro and J. Mouritsen (Eds). London, International Thomson Business Press, and references therein."

(Cited by: Strathern, Marilyn (1997). "'Improving ratings': audit in the British University system". European Review. John Wiley & Sons. 5 (3): 305–321. doi:10.1002/(SICI)1234-981X(199707)5:3<05::AID-EURO184>3.0.CO;2-4.)

See Google Books, and Worldcat (Available in many UK universities, incl. Cambridge & Oxford, and in the NYPL, at MIT, etc.) 


Context: Looking for sources about the history of Goodhart's law, esp. as "quoted"/ paraphrased, seemingly by Strathern.

From Strathern's paper:
"When a measure becomes a target, it ceases to be a good measure. The more a 2.1 examination performance becomes an expectation, the poorer it becomes as a discriminator of individual performances. Hoskin describes this as ‘Goodhart’s law’, after the latter’s observation on instruments for monetary control which lead to other devices for monetary flexibility having to be invented. However, targets that seem measurable become enticing tools for improvement."

Comment by Davidmanheim on Wanting to Succeed on Every Metric Presented · 2021-04-14T11:28:10.726Z · LW · GW

Noting the obvious connection to Goodhart's law - and elsewhere I've described the mistake of pushing to maximize easy-to-measure / cognitively available items rather than true goals.

Comment by Davidmanheim on Systematizing Epistemics: Principles for Resolving Forecasts · 2021-04-03T19:23:20.068Z · LW · GW

 Yeah, that's true. I don't recall exactly what I was thinking. 

Perhaps it was regarding time-weighting, and the difficulty of seeing what your score will be based on what you predict - but the Metaculus interface handles this well, modulus early closings, which screw lots of things up. Also, log-scoring is tricky when you have both continuous and binary outcomes, since they don't give similar measures - being well calibrated for binary events isn't "worth" as much, which seems perverse in many ways.

Comment by Davidmanheim on Systematizing Epistemics: Principles for Resolving Forecasts · 2021-04-01T05:41:48.790Z · LW · GW

In many cases, yes. But for some events, the "obvious" answers are not fully clear until well after the event in question takes place - elections, for example.

Comment by Davidmanheim on How many micromorts do you get per UV-index-hour? · 2021-03-31T17:04:21.658Z · LW · GW

About 20% of Americans develop skin cancer during their lifetime, and the 5-year overall survival rate for melanoma is over 90 percent. Taking this as the mortality risk, i.e. ignoring timing and varied risk levels, it's a 2% risk of (eventual) death.

But risk of skin cancer depends on far more than sun exposure - and the more important determinant is frequency of sunbathing below age 30. Other factors that seem to matter are skin color, skin response (how much you burn,) weight, and family history of cancers.
 

Comment by Davidmanheim on Systematizing Epistemics: Principles for Resolving Forecasts · 2021-03-31T14:10:41.430Z · LW · GW

re: "Get this wrong" versus "the balance should be better," there are two different things that are being discussed. The first is about defining individual questions via clear resolution criteria, which I think is doe well, and the second is about defining clear principles that provide context and inform what types of questions and resolution criteria are considered good form.

A question like "will Democrats pass H.R.2280 and receive 51 votes in the Senate" is very well defined, but super-narrow, and easily resolved "incorrectly" if the bill is incorporated into another bill, or if an adapted bill is proposed by a moderate Republican and passes instead, or passed via some other method, or if it passes but gets vetoed by Biden. But it isn't an unclear question, and given the current way that Metaculus is run, would probably be the best way of phrasing the question. Still, it's a sub-par question, given the principles I mentioned. A better one would be "Will a bill such as H.R.2280 limiting or banning straw purchases of firearms be passed by the current Congress and enacted?" It's much less well defined, but the boundaries are very different. It also uses "passed" and "enacted", which have gray areas. At the same time, the failure modes are closer to the ones that we care about near the boundary of the question. However, given the current system, this question is obviously worse - it's harder to resolve, it's more likely to be ambiguous because a bill that does only some of the thing we care about is passed, etc.

Still, I agree that the boundaries here are tricky, and I'd love to think more about how to do this better.

Comment by Davidmanheim on Systematizing Epistemics: Principles for Resolving Forecasts · 2021-03-30T18:37:09.642Z · LW · GW

I haven't said, and I don't think, that the majority of markets and prediction sites get this wrong. I think they navigate this without a clear framework, which I think the post begins providing. And I strongly agree that there isn't a slam-dunk-no-questions case for principles overriding rules, which the intro might have implied too strongly. I also agree with your point about downsides of ambiguity potentially overriding the benefits of greater fidelity to the intent of a question, and brought it up in the post. Still, excessive focus on making rules on the front end, especially for longer-term questions and ones where the contours are unclear, rather than explicitly being adaptive, is not universally helpful. 

And clarifications that need to change the resolution criteria mid-way are due to either bad questions, or badly handled resolutions. At the same time, while there are times that avoiding ambiguity is beneficial, there are also times when explicitly  addressing corner cases to make them unambiguous ("if the data is discontinued or the method is changed, the final value posted using the current method will be used") makes the question worse, rather than better. 

Lastly, I agree that one general point I didn't say, but agree with, was that "where the spirit and letter of a question conflict, the question should be resolved based on the spirit." I mostly didn't make an explicit case for this because I think it's under-specified as a claim. Instead, the three more specific claims I would make are: 
1) When the wording of a question seems ambiguous, the intent should be an overriding reason to choose an interpretation.
2) When the wording of a question is clear, the intent shouldn't change the resolution.

Comment by Davidmanheim on Thirty-three randomly selected bioethics papers · 2021-03-30T05:02:57.319Z · LW · GW

As an aside, I find it bizarre that Economics gets put at 9 - I think a review of what gets done in top econ journals would cause you to update that number down by at least 1. (It's not usually very bad, but it's often mostly useless.) And I think it's clear that lots of Econ does, in fact, have a replication crisis. (But we'll if see that is true as some of the newer replication projects actually come out with results.)

Comment by Davidmanheim on Resolutions to the Challenge of Resolving Forecasts · 2021-03-18T19:36:55.372Z · LW · GW

Generally agree that there's something interesting here, but I'm still skeptical that in most prediction market cases there would be enough money across questions, and enough variance in probabilities, for this to work well.

Comment by Davidmanheim on Resolutions to the Challenge of Resolving Forecasts · 2021-03-18T19:34:47.361Z · LW · GW

For betting markets, the market maker may need to manage the odds differently, and for prediction markets, it's because otherwise you're paying people in lower brier scores for watching the games, rather than being good predictors beforehand. (The way that time-weighted brier scores work is tricky - you could get it right, but in practice it seems that last minute failures to update are fairly heavily penalized.)

Comment by Davidmanheim on Dark Matters · 2021-03-18T19:32:07.890Z · LW · GW

That's good to hear. But if "he started at 60%," that seems to mean if he "still thinks dark matter is overwhelmingly likely" he is updating in the wrong direction. (Perhaps he thought it was 60% likely that the LHC found dark matter? In which case I still think that he should update away from "overwhelmingly likely" - it's weak evidence against the hypothesis, but unless he started out almost certain, "overwhelmingly" seems to go a bit too far.)

Comment by Davidmanheim on Resolutions to the Challenge of Resolving Forecasts · 2021-03-16T18:24:35.971Z · LW · GW

Yes, that was exactly what I was thinking of, but 1) I didn't remember the name, and 2) I wanted a concrete example relevant to prediction markets.

And I agree it's hard to estimate in general, but the problem can still be relevant in many cases - which is why I used my example. In the baseball game, if the market closes before the game begins - we don't have a model as good as the market, but once the game is 7/9th complete, we can do better than the pre-game market prediction.

Comment by Davidmanheim on Resolutions to the Challenge of Resolving Forecasts · 2021-03-16T06:06:18.011Z · LW · GW

It's an interesting idea, but one that seems to have very high costs for forecasters in keeping the predictions updated and coherent.

If we imagine that we pay forecasters the market value of their time, an active forecasting question with a couple dozen people spending a half hour each updating their forecast "costs" thousands of dollars per week. Multiplying that, even when accounting for reduced costs for similar questions, seems not worth the cost.

Comment by Davidmanheim on Dark Matters · 2021-03-15T22:28:08.967Z · LW · GW

"isn't it quite odd that looking around at different parts of the universe seems to produce such a striking level of agreement on how much missing mass there is?"

But they don't. Dark matter, as a theory, posits that the amount of mass that "must be there somewhere" varies in amount and distribution in an ad-hoc fashion to explain the observations. I think it's likely that whatever is wrong with the theory, on the other hand, isn't varying wildly by where in the universe it is. Any such explanation would (need to) be more parsimonious, not less so.

And I agree that physics isn't obligated to make things easy to find - but when the dark matter theory was postulated, they guessed it was a certain type of WIMP, and then kept not finding it. Postulating that it must be there somewhere, and physics doesn't need to make it easy, isn't properly updating against the theory as each successive most likely but still falsifiable guess has been falsified.

Comment by Davidmanheim on Dark Matters · 2021-03-15T09:00:13.913Z · LW · GW

This was fantastic, and still leaves me with a conclusion that "dark matter" isn't a specific hypothesis, it's a set of reasons to think we're missing something in our theories which isn't modified gravity.

That is, saying "Given that everything we see is consistent with Gravity being correct, we conclude that there is not enough baryonic matter to account for what we see," doesn't prove the existence of large amounts of non-baryonic matter. Instead, the evidence provides strong indication that either A) there is something we can't see that has some properties of non-baryonic matter, or B) something is wrong with some theory which isn't what people propose as modified gravity. We knew enough to say that decades ago. We've looked for every type of non-baryonic matter we can think of, and have only been able to eliminate possibilities. The evidence is still pointing to "something else," and we have some actual or claimed physical objects that aren't actually prohibited as an answer - but nothing pointing to them.

This sounds a lot like what we would have said pre-special relativity about Galileo’s relativity (which Newton hated because it didn't allow distinguishing between relative and absolute motion,) and electromagnetism, which didn't seem to follow the rules for indistinguishability of relative and absolute motion, but was much too good of a theory to be wrong. 

Pre-Einstein, we had good reasons to think there's something there that lets relativity and Maxwell be consistent, but don't know what it is. They DID have reason to think the answer wasn't "Maxwell was wrong about the absolute speed limit for light," just like we know the answer isn't "gravity just works differently," but actually plugging the hole required a new conceptual model.

In Bayesian terms, we should have some prior on "gravity + visible mass," some prior on "modified gravity," some prior on "invisible mass that accounts for observation," and some prior on "something else will be found for a theory", and every piece of evidence seems like it's at least as strong evidence for #4 as it is for #3, and our continued lack of success finding the former best candidates for what the invisible mass is made of is better evidence for #4 than #3.

Comment by Davidmanheim on Strong Evidence is Common · 2021-03-15T07:30:09.166Z · LW · GW

"Worth having" is a separate argument about relative value of new information. It is reasonable when markets exist or we are competing in other ways where we can exploit our relative advantage. But there's a different mistake that is possible which I want to note.

Most extreme beliefs are false; for every correct belief, there are many, many extreme beliefs that are false. Strong consensus on some belief is (evidence for the existence of) strong evidence of the truth of that belief, at least among the considered alternatives. So picking a belief on the basis of extremity ("Most sheeple think X, so consider Y") is doing this the wrong way around, because extremity alone is negligible evidence of value. (Prosecutor's fallacy.)

What makes the claim that extremity isn't a useful indicator of value, less valid? That is, where should we think that extreme beliefs should even be considered? 

I think the answer is when the evidence is both novel and cumulatively outweighs the prior consensus, or the belief is new / previously unconsidered. ("We went to the moon to inspect the landing site," not "we watched the same video again and it's clearly fake.") So we should only consider extreme beliefs, even on the basis of our seemingly overwhelming evidence, if the proposed belief is significantly newer than the extant consensus AND we have a strong argument that the evidence is not yet widely shared / understood.

Comment by Davidmanheim on Resolutions to the Challenge of Resolving Forecasts · 2021-03-15T07:10:03.502Z · LW · GW

I think we agree on this - iterated closing is an interesting idea, but I'm not sure it solves a problem. It doesn't help with ambiguity, since we can't find bounds. And earlier payouts are nice, but by the time we can do partial payouts, they are either tiny, because of large ranges, or they are not much before closing. (They also create nasty problems with incentive compatibility, which I'm unsure can be worked out cleanly.)

Comment by Davidmanheim on Resolutions to the Challenge of Resolving Forecasts · 2021-03-14T07:14:10.096Z · LW · GW

"partial resolution seems like it would be useful"
I hadn't thought of this originally, but Nuno added the category of "Resolve with a Probability," which does this. The idea of iterated closing of a question as the bounds improve is neat, but probably technically challenging. (GJ Inc. kind-of does this when they close answer options that are already certain to be wrong, such as total ranges below the current number of CVOID cases.) I'd also worry it creates complexity that makes it much less clear to forecasters how things will work.

"one helpful mechanism might be to pay experts based on agreement with the majority of experts" 
Yes - this has been proposed under the same set of ideas as "meta-forecasts have also been proposed as a way to resolve very long term questions," though I guess it has clearer implications for otherwise ambiguous short term questions. I should probably include it. The key problem in my mind, which isn't necessarily fatal, is that it makes incentive compatibility into a fairly complex game-theoretic issue, with collusion and similar issues being possible.

"keeping evaluation methods secret can help avert Goodhart" 
Yes, I've definitely speculated along those lines. But for the post, I was worried that once I started talking about this as a Goodhart-issue, I would need to explain far more, and be very side-tracked, and it's something I will address more in the next post in any case.

Comment by Davidmanheim on RadVac Commercial Antibody Test Results · 2021-03-12T07:25:42.151Z · LW · GW

Not sure that you'd get reactions from large subunits if they fold differently than the full spike - but my biochemistry/immunology isn't enough to be sure about how this would work.

Comment by Davidmanheim on RadVac Commercial Antibody Test Results · 2021-03-12T07:23:57.603Z · LW · GW

Hence "(recent)"

Comment by Davidmanheim on RadVac Commercial Antibody Test Results · 2021-03-01T08:16:00.553Z · LW · GW

 "Aside from the test result, we do have one more small piece of information to update on: I was quite congested for 1-2 days after the most recent three doses (and I was generally not congested the rest of the week). That's exactly what we'd expect to see if the vaccine is working as intended, and it's pretty strong evidence that it's doing something."


Agree that this is evidence it is doing something, but my strong prior is that the adjuvant alone (chitosan) would cause this to happen. 

I'm also unclear about why you chose the weekly schedule, or if you waited long enough to see any impact. (Not that the RadVac test would tell you anything.) The white paper suggests *at least* one week between doses, and suggests taking 3 doses, for healthy young adults. 

According to the white paper, you're likely to be protected, and I think continuing now would add danger without corresponding benefit. You said in the original post that you might continue dosing. I don't know enough to comment usefully about either immune tolerance or adjuvant hyperstimulation, but I suggest talking to a immunologist about those risks and how they change if you in fact continue and try "more dakka," since continuing to does seems like it would increase those risks.

Strongly agree that ELISA tests are more valuable than more RadVac, and it would be at least moderate evidence one way or another. (But even if you can induce immune reactions to parts of the virus, it' unclear how much that would actually reduce your risk if infected.)

Comment by Davidmanheim on RadVac Commercial Antibody Test Results · 2021-03-01T07:56:47.265Z · LW · GW

I agree that posting the results was the correct thing to do, and appreciate that John is trying to figure out if this is useful - but I actually claim the post is an example of how rationality is hard, and even pursuing it can be misleading if you aren't very, very careful.

In The Twelve Virtues of Rationality, this post gets virtue points for the first (curiosity, for looking into whether it works,) third (lightness, being willing to update marginally on evidence,) fourth (evenness, updating even when the evidence isn't in the direction desired,) sixth (empiricism, actually testing something,) and tenth (precision, specifying what was expected.) But virtue is certainly not a guarantee of success, even for completely virtuous approaches. 

I think this tries to interpret data correctly, but falls short on the eleventh virtue, scholarship. For those who want to do Actual Science™, the first step is to know about the domain, and make sure your experiment is valid and useful. Going out and interacting with reality is valuable once your models are good enough to be able to interpret evidence. But Science is hard. (Perhaps not as hard as rationality, at least in some ways, but still very, very difficult.) In this case, without checking what the specific target being tested for was, as Christian notes, the data doesn't actually provide useful evidence. And if he had a (recent) asymptomatic case of COVID, the result would have been positive, which is evidence that the vaccine doesn't work, but would have been interpreted as evidence that it did.

Comment by Davidmanheim on RadVac Commercial Antibody Test Results · 2021-03-01T07:38:15.230Z · LW · GW

You need to see if the spike peptide included corresponds to the antibody being tested for - and given how many targets there are, I would be surprised if it did.

Despite holding a far lower prior on efficacy, I'm agreeing with Christian - this evidence shouldn't be a reason to update anywhere nearly as strongly as you did against effectiveness.

Comment by Davidmanheim on Making Vaccine · 2021-02-28T09:49:01.126Z · LW · GW

Mostly vague "accidents and harmful unknown unknowns aren't that unlikely here" - because we have data on baseline success at "not have harmful side effects," and it is low. We also know that lots of important side effects are unusual, so the expected loss can be high even after a number of "successes," and this is doubly true because no-one is actually tracking side effects. We don't know much about efficacy either, but again, on base rates it is somewhat low. (Base rates for mRNA are less clear, and may be far higher - but these sequences are unfiltered, so I'm not sure even those bse rates would apply.) 

Finally, getting the adjuvants to work is typically tricky for vaccines, and I'd be very concerned about making them useless, or inducing reactions to something other than the virus. But if you want to know about intentional misuse, it's relatively low. I would wonder about peanut protein to induce you to develop a new allergy because you primed your immune system to react to a new substance, but you'd need someone more expert than I.

Overall, I'd be really happy taking bets that in 20 years, looking back with (hopefully) much greater understanding of mRNA vaccines, a majority of immunologists would respond to hearing details about this idea with a solid "that's idiotic, what the hell were those idiots thinking?" (If anyone wants to arrange details of this bet, let me know - it sounds like a great way to diversify and boost my expected retirement returns.)