Posts
Comments
Previously: Model Combination and Adjustment.
Very cool that you posted these quantified predictions in advance!
Nice write-up!
A few thoughts re: Scott Alexander & Rob Wiblin on prediction.
- Scott wrote that "On February 20th, Tetlock’s superforecasters predicted only a 3% chance that there would be 200,000+ coronavirus cases a month later (there were)." I just want to note that while this was indeed a very failed prediction, in a sense the supers were wrong by just two days. (WHO-counted cases only reached >200k on March 18th, two days before question close.)
- One interesting pre-coronavirus probabilistic forecast of global pandemic odds is this: From 2016 through Jan 1st 2020, Metaculus users made forecasts about whether there would be a large pandemic (≥100M infections or ≥10M deaths in a 12mo period) by 2026. For most of the question’s history, the median forecast was 10%-25%, and the special Metaculus aggregated forecast was around 35%. At first this sounded high to me, but then someone pointed out that 4 pandemics from the previous 100 years qualified (I didn't double-check this), suggesting a base rate of 40% chance per decade. So the median and aggregated forecasts on Metaculus were actually lower than the naive base rate (maybe by accident, or maybe forecasters adjusted downward because we have better surveillance and mitigation tools today?), but I'm guessing still higher than the probabilities that would've been given by most policymakers and journalists if they were in the habit of making quantified falsifiable forecasts. Moreover, using the Tetlockian strategy of just predicting the naive base rate with minimal adjustment would've yielded an even more impressive in-advance prediction of the coronavirus pandemic.
- More generally, the research on probabilistic forecasting makes me suspect that prediction polls/markets with highly-selected participants (e.g. via GJI or HyperMind), or perhaps even those without highly-selected participants (e.g. via GJO or Metaculus), could achieve pretty good calibration (though not necessarily resolution) on high-stakes questions (e.g. about low-probability global risks) with 2-10 year time horizons, though this has not yet been checked.
Nice post. Were there any sources besides Wikipedia that you found especially helpful when researching this post?
If the U.S. kept racing in its military capacity after WW2, the U.S. may have been able to use its negotiating leverage to stop the Soviet Union from becoming a nuclear power: halting proliferation and preventing the build up of world threatening numbers of high yield weapons.
BTW, the most thorough published examination I've seen of whether the U.S. could've done this is Quester (2000). I've been digging into the question in more detail and I'm still not sure whether it's true or not (but "may" seems reasonable).
I'm very interested in this question, thanks for looking into it!
My answer from 2017 is here.
Interesting historical footnote from Louis Francini:
This issue of differing "capacities for happiness" was discussed by the classical utilitarian Francis Edgeworth in his 1881 Mathematical Psychics (pp 57-58, and especially 130-131). He doesn't go into much detail at all, but this is the earliest discussion of which I am aware. Well, there's also the Bentham-Mill debate about higher and lower pleasures ("It is better to be a human being dissatisfied than a pig satisfied"), but I think that may be a slightly different issue.
Cases where scientific knowledge was in fact lost and then rediscovered provide especially strong evidence about the discovery counterfactauls, e.g. Hero's eolipile and al-Kindi's development of relative frequency analysis for decoding messages. Probably we underestimate how common such cases are, because the knowledge of the lost discovery is itself lost — e.g. we might easily have simply not rediscovered the Antikythera mechanism.
Apparently Shelly Kagan has a book coming out soon that is (sort of?) about moral weight.
This scoring rules has some downsides from a usability standpoint. See Greenberg 2018, a whitepaper prepared as background material for a (forthcoming) calibration training app.
Some other people at Open Phil have spent more time thinking about two-envelope effects more than I have, and fwiw some of their thinking on the issue is in this post (e.g. see section 1.1.1.1).
My own take on this is described briefly here, with more detail in various appendices, e.g. here.
Yes, I meant to be describing ranges conditional on each species being moral patients at all. I previously gave my own (very made-up) probabilities for that here. Another worry to consider, though, is that many biological/cognitive and behavioral features of a species are simultaneously (1) evidence about their likelihood of moral patienthood (via consciousness), and (2) evidence about features that might affect their moral weight *given* consciousness/patienthood. So, depending on how you use that evidence, it's important to watch out for double-counting.
I'll skip responding to #2 for now.
For anyone who is curious, I cite much of the literature arguing over criteria for moral patienthood/weight in the footnotes of this section of my original moral patienthood report. My brief comments on why I've focused on consciousness thus far are here.
Cool, this looks better than I'd been expecting. Thanks for doing this! Looking forward to next round.
Hurrah failed project reports!
One of my most-used tools is very simple: an Alfred snippet that lets me paste-as-plain-text using Cmd+Opt+V.
Thanks!
From a user's profile, be able to see their comments in addition to their posts.
Dunno about others, but this is actually one of the LW features I use the most.
(Apologies if this is listed somewhere already and I missed it.)
Probably not suitable for launch, but given that the epistemic seriousness of the users is the most important "feature" for me and some other people I've spoken to, I wonder if some kind of "user badges" thing might be helpful, especially if it influences the weight that upvotes and downvotes from those users have. E.g. one badge could be "has read >60% of the sequences, as 'verified' by one of the 150 people the LW admins trust to verify such a thing about someone" and "verified superforecaster" and probably some other things I'm not immediately thinking of.
- Constantly.
- Frequently.
Thanks for briefly describing those Doctor Who episodes.
Lists of textbook award winners like this list might also be useful.
Fixed, thanks.
Today I encountered a real-life account of a the chain story — involving a cow rather than an elephant — around 24:10 into the "Best of BackStory, Vol. 1" episode of the podcast BackStory.
"Accuracy-boosting" or "raising accuracy"?
Source. But the non-cached page says "The details of this job cannot be viewed at this time," so maybe the job opening is no longer available.
FWIW, I'm a bit familiar with Dafoe's thinking on the issues, and I think it would be a good use of time for the right person to work with him.
Hi Rick, any updates on the Audible version?
See also: https://scholar.google.com/scholar?cluster=9557614170081724663&hl=en&as_sdt=1,5
Just donated!
Hurray!
Any chance you'll eventually get this up on Audible? I suspect that in the long run, it can find a wider audience there.
Another attempt to do something like this thread: Viva la Books.
I guess subjective logic is also trying to handle this kind of thing. From Jøsang's book draft:
Subjective logic is a type of probabilistic logic that allows probability values to be expressed with degrees of uncertainty. The idea of probabilistic logic is to combine the strengths of logic and probability calculus, meaning that it has binary logic’s capacity to express structured argument models, and it has the power of probabilities to express degrees of truth of those arguments. The idea of subjective logic is to extend probabilistic logic by also expressing uncertainty about the probability values themselves, meaning that it is possible to reason with argument models in presence of uncertain or incomplete evidence.
Though maybe this particular formal system has really undesirable properties, I don't know.
Donated $300.
Never heard of him.
For those who haven't been around as long as Wei Dai…
Eliezer tells the story of coming around to a more Bostromian view, circa 2003, in his coming of age sequence.
Just FYI, I plan to be there.
Any idea when the book is coming out?
Just FYI to readers: the source of the first image is here.
I don't know if this is commercially feasible, but I do like this idea from the perspective of building civilizational competence at getting things right on the first try.
Might you be able to slightly retrain so as to become an expert on medium-term and long-term biosecurity risks? Biological engineering presents serious GCR risk over the next 50 years (and of course after that, as well), and very few people are trying to think through the issues on more than a 10-year time horizon. FHI, CSER, GiveWell, and perhaps others each have a decent chance of wanting to hire people into such research positions over the next few years. (GiveWell is looking to hire a biosecurity program manager right now, but I assume you can't acquire the requisite training and background immediately.)
I think it's partly not doing enough far-advance planning, but also partly just a greater-than-usual willingness to Try Things that seem like good ideas even if the timeline is a bit rushed. That's how the original minicamp happened, which ended up going so well that it inspired us to develop and launch CFAR.
People have complained about Sumatra not working with MIRI's PDF ebooks, too. It was hard enough already to get our process to output the links we want on most readers, so we decided not to make the extra effort to additionally support Sumatra. I'm not sure what it would take.
Updated, thanks!
Fixed, thanks.
Maybe just use odds ratios. That's what I use when I'm trying to make updates on the spot.
Working on MIRI's current technical agenda mostly requires a background in computer science with an unusually strong focus on logic: see details here. That said, the scope of MIRI's research program should be expanding over time. E.g. see Patrick's recent proposal to model goal stability challenges in a machine learning system, which would require more typical AI knowledge than has usually been the case for MIRI's work so far.
MIRI's research isn't really what a mathematician would typically think of as "math research" — it's more like theory-heavy computer science research with an unusually significant math/logic component, as is the case with a few other areas of computer science research, e.g. program analysis.
Also see the "Our recommended path for becoming a MIRI research fellow" section on our research fellow job posting.