Posts

Forecasting Newsletter: July 2020. 2020-08-01T17:08:15.401Z · score: 21 (4 votes)
Forecasting Newsletter. June 2020. 2020-07-01T09:46:04.555Z · score: 26 (7 votes)
Forecasting Newsletter: May 2020. 2020-05-31T12:35:58.063Z · score: 8 (5 votes)
Forecasting Newsletter: April 2020 2020-04-30T16:41:35.849Z · score: 21 (6 votes)
What are the relative speeds of AI capabilities and AI safety? 2020-04-24T18:21:58.528Z · score: 8 (4 votes)
Some examples of technology timelines 2020-03-27T18:13:19.834Z · score: 23 (9 votes)
[Part 1] Amplifying generalist research via forecasting – Models of impact and challenges 2019-12-19T15:50:33.412Z · score: 53 (13 votes)
[Part 2] Amplifying generalist research via forecasting – results from a preliminary exploration 2019-12-19T15:49:45.901Z · score: 48 (12 votes)
What do you do when you find out you have inconsistent probabilities? 2018-12-31T18:13:51.455Z · score: 16 (6 votes)
The hunt of the Iuventa 2018-03-10T20:12:13.342Z · score: 11 (5 votes)

Comments

Comment by nunosempere on Forecasting Newsletter: July 2020. · 2020-08-02T10:11:32.147Z · score: 1 (1 votes) · LW · GW

Thanks.

The major friction for me is that some of the formatting makes it feel overwhelming. Maybe use bold headings instead of bullet points for each new entry? Not sure.

Fair point; will consider.

Comment by nunosempere on ozziegooen's Shortform · 2020-08-01T15:35:05.989Z · score: 10 (4 votes) · LW · GW

> The name comes straight from the Latin though

From the Greek as it happens. Also, alethephobia would be a double negative, with a-letheia meaning a state of not being hidden; a more natural neologism would avoid that double negative. Also, the greek concept of truth has some differences to our own conceptualization. Bad neologism. 

Comment by nunosempere on Competition: Amplify Rohin’s Prediction on AGI researchers & Safety Concerns · 2020-07-23T13:48:38.921Z · score: 5 (4 votes) · LW · GW

Notes

  • Field of AGI research plausibly commenced on 1956 with Dartmouth conference. What happens if one uses Laplace's rule? Then a priori pretty implausible that it will happen, if it hasn't happened soon.

  • How do information cascades work in this context? How many researchers would I expect to have read and recall a reward gaming list (1, 2, 3, 4)

  • Here is A list of good heuristics that the case for AI x-risk fails. I'd expect that these, being pretty good heuristics, will keep having an effect on AGI researchers that will continue keeping them away from considering x-risks.

  • Rohin probably doesn't actually have enough information or enough forecasting firepower to predict that it hasn't happened at 0.1%, and be calibrated. He probably does have the expertise, though. I did some experiments a while ago, and "I'd be very surprised if I were wrong" translated for me to a 95%. YMMV.

  • An argument would go: "The question looks pretty fuzzy to me, having moving parts. Long tails are good in that case, and other forecasters who have found some small piece of evidence are over-updating." Some quotes:

    There is strong experimental evidence, however, that such self-insight is usually faulty. The expert perceives his or her own judgmental process, including the number of different kinds of information taken into account, as being considerably more complex than is in fact the case. Experts overestimate the importance of factors that have only a minor impact on their judgment and underestimate the extent to which their decisions are based on a few major variables. In short, people's mental models are simpler than they think, and the analyst is typically unaware not only of which variables should have the greatest influence, but also which variables actually are having the greatest influence. (Source: Psychology of Intelligence Analysis , Chapter 5)

    Our judges in this study were eight individuals, carefully selected for their expertise as handicappers. Each judge was presented with a list of 88 variables culled from the past performance charts. He was asked to indicate which five variables out of the 88 he would wish to use when handicapping a race, if all he could have was five variables. He was then asked to indicate which 10, which 20, and which 40 he would use if 10, 20, or 40 were available to him.

    We see that accuracy was as good with five variables as it was with 10, 20, or 40. The flat curve is an average over eight subjects and is somewhat misleading. Three of the eight actually showed a decrease in accuracy with more information, two improved, and three stayed about the same. All of the handicappers became more confident in their judgments as information increased. (Source: Behavioral Problems of Adhering to a Decision Policy)

    • I'm not sure to what extent this is happening with forecasters here: finding a particularly interesting and unique nugget of information and then over-updating. I'm also not sure to what extent I actually believe that this question is fuzzy and so long tails are good.

Here is my first entry to the competition. Here is my second and last entry to the competition. My changes are that I've assigned some probability (5%; I'd personally assign 10%) that it has already happened.

Some notes about that distribution:

  • Note that this is not my actual distribution, this is my guess as to how Rohin will update
  • My guess doesn't manipulate Rohin's distribution much; I expect that Rohin will not in fact change his mind a lot.
  • In fact, this is not exactly my guess as how Rohin will update. That is, I'm not maximizing expected accuracy, I'm ~maximizing the chance of getting first place (subject to spending little time on this)

Some quick comments at forecasters:

  • I think that the distinction between the forecaster's beliefs and Rohin's is being neglected. Some of the snapshots predict huge updates, which really don't seem likely.
Comment by nunosempere on Life at Three Tails of the Bell Curve · 2020-06-28T18:25:51.253Z · score: 4 (3 votes) · LW · GW

Thanks for this post.

Comment by nunosempere on An online prediction market with reputation points · 2020-06-14T09:28:57.394Z · score: 7 (3 votes) · LW · GW

Hey! I think this is cool. May I suggest "How many people in Kings County, NY, will be confirmed to have died from COVID-19 during September?" as a question?

I have a forecasting newsletter with ~150 subscribers; I'll make sure to mention this post when it gets sent at the end of this month.

Comment by nunosempere on What are the best tools for recording predictions? · 2020-05-25T08:26:00.171Z · score: 3 (2 votes) · LW · GW

Foretold has a public API; requests can be made to it from anything that sends requests. This would require some work.

Comment by nunosempere on What are the best tools for recording predictions? · 2020-05-25T08:19:19.992Z · score: 7 (2 votes) · LW · GW

Personally, I've used Foretold, Google Sheets, CSVs, an R script, and my own bash script (PredictResolveTally) (which writes to a csv.).

Personally, I like my own setup best (it does work at the 5 second level), but I think you'd be better off just using a CSV, and then analyzing your results every so often with the programming language of your choice. For the analysis part, this is a Python library I'm looking forward to using.

Comment by nunosempere on Assessing Kurzweil predictions about 2019: the results · 2020-05-13T07:41:57.638Z · score: 1 (1 votes) · LW · GW

Browsing Wikipedia, a similar effort was the 1985 book Tools for thought, (available here), though I haven't read it.

Comment by nunosempere on What will be the big-picture implications of the coronavirus, assuming it eventually infects >10% of the world? · 2020-05-06T11:04:46.235Z · score: 3 (2 votes) · LW · GW

As an heavy predictit user

Could you say more about this? What is your ranking in PredictIt / what is your track record? In particular, GJOpen, for example, doesn't expect Trump to win

Comment by nunosempere on Forecasting Newsletter: April 2020 · 2020-05-06T09:09:06.160Z · score: 2 (2 votes) · LW · GW

This might be of interest: https://www.kill-the-newsletter.com/.

Comment by nunosempere on Forecasting Newsletter: April 2020 · 2020-05-01T19:27:27.747Z · score: 3 (2 votes) · LW · GW

I'll let you know. 30%-ish.

Comment by nunosempere on Are there any prediction markets for Covid-19? · 2020-04-23T13:51:42.391Z · score: 1 (1 votes) · LW · GW

Now there is a pure version of what you were looking for! Corona Information Markets

Comment by nunosempere on Poll: ask anything to an all-knowing demon edition · 2020-04-23T13:45:49.879Z · score: 2 (2 votes) · LW · GW

Consider the futures of humanity which I would, upon reflection, endorse as among the best of utopias, and consider the simplest Turing Machines which encode them. If you apply (some function which turns their states after n steps into a real number and concatenate them), would the output of such calculation belong to (this randomly chosen half of the real numbers)?

I'm sure this can be worded more carefully, but right now this may force the oracle to simulate all the futures of humanity which I would consider to be among the best of utopias.

Comment by nunosempere on The Unilateralist’s “Curse” Is Mostly Good · 2020-04-16T16:52:08.644Z · score: 3 (2 votes) · LW · GW

Machiavelli's The Prince, and various other texts.

Comment by nunosempere on Are there any prediction markets for Covid-19? · 2020-04-13T16:29:52.631Z · score: 5 (3 votes) · LW · GW

predict.replicationmarkets.com will be looking into predicting viable treatments in the future, according to their last newsletter, but right now it's mostly hypothetical. Rewards are monetary.

pandemic.metaculus.com has several prediction tournaments with monetary prizes. Example.

foretold.io has two active covid communities. No monetary prizes, but predictions are used by epidemicforecasting.org

gjopen.com has plenty of covid questions, but the only reward is reputational.

Augur.net may have some markets, but they're transitioning to v2.0 and markets are temporarily unavailable.

Comment by nunosempere on On characterizing heavy-tailedness · 2020-02-16T22:21:46.172Z · score: 6 (3 votes) · LW · GW

Can you give some more intuitions as to why allowing finite support is among your criteria?

I can imagine a definition which, lacking this criterion, is still useful, and requiring to have infinite support might be a useful reminder that 0 and 1 are not probabilit(y densities). Further, whereas requiring infinite support might risk analyzing absurd outcomes, it may also allow us to consider, and thus reach maximally great futures.

Reality is inherently bounded - I can confidently assert that there is no possible risk today that would endanger a trillion lives, because I am confident the number of people on the planet is well below that.

Consider that the number of animal lives is probably greater than one trillion, and you didn't specify *human* lives. You could also consider future lives, or abstruse moral realism theories. Your definition of personhood (moral personhood?) could change. Having finite support considered harmful (?).

Comment by nunosempere on ozziegooen's Shortform · 2020-01-09T11:19:56.811Z · score: 5 (3 votes) · LW · GW

Here is another point by @jacobjacob, which I'm copying here in order for it not to be lost in the mists of time:

Though just realised this has some problems if you expected predictors to be better than the evaluators: e.g. they’re like “one the event happens everjacobyone will see I was right, but up until then no one will believe me, so I’ll just lose points by predicting against the evaluators” (edited)

Maybe in that case you could eventually also score the evaluators based on the final outcome… or kind of re-compensate people who were wronged the first time…
Comment by nunosempere on ozziegooen's Shortform · 2020-01-08T13:20:39.439Z · score: 3 (2 votes) · LW · GW

Another point in favor of such a set-up would be that aspiring superforecasters get much, much more information when they see ~[the prediction of a superforecaster would have made having their information]; a point vs a distribution. I'd expect that this means that market participants would get better, faster.

Comment by nunosempere on ozziegooen's Shortform · 2020-01-08T12:41:47.263Z · score: 1 (1 votes) · LW · GW
This is somewhat solved if you have a forecaster that you trust that can make a prediction based on Sophia's seeming ability and honesty. The naive thing would be for that forecaster to predict their own distribution of the log-loss of Sophia, but there's perhaps a simpler solution. If Sophia's provided loss distribution is correct, that would mean that she's calibrated in this dimension (basically, this is very similar to general forecast calibration). The trusted forecaster could forecast the adjustment made to her term, instead of forecasting the same distribution. Generally this would be in the direction of adding expected loss, as Sophia probably had more of an incentive to be overconfident ( which would result in a low expected score from her) than underconfident. This could perhaps make sense as a percentage modifier (-30% points), a mean modifier (-3 to -8 points), or something else. Is it actually true that forecasters would find it easier to forecast the adjustment?> This is somewhat solved if you have a forecaster that you trust that can make a prediction based on Sophia's seeming ability and honesty. The naive thing would be for that forecaster to predict their own distribution of the log-loss of Sophia, but there's perhaps a simpler solution. If Sophia's provided loss distribution is correct, that would mean that she's calibrated in this dimension (basically, this is very similar to general forecast calibration). The trusted forecaster could forecast the adjustment made to her term, instead of forecasting the same distribution. Generally this would be in the direction of adding expected loss, as Sophia probably had more of an incentive to be overconfident ( which would result in a low expected score from her) than underconfident. This could perhaps make sense as a percentage modifier (-30% points), a mean modifier (-3 to -8 points), or something else.

Is it actually true that forecasters would find it easier to forecast the adjustment?

Comment by nunosempere on ozziegooen's Shortform · 2020-01-08T12:25:53.868Z · score: 1 (1 votes) · LW · GW
We could also say that if we took a probability distribution of the chances of every possible set of findings being true, the differential entropy of that distribution would be 0, as smart forecasters would recognize that inputs_i s correct with ~100% probability.

In that paragraph, did you mean to say "findings_i is correct"?

***

Neat idea. I'm also not sure whether the idea is valuable because it could be implementable, or from "this is interesting because it gets us better models".

In the first case, I'm not sure whether the correlation is strong enough to change any decisions. That is, I'm having trouble thinking of decisions for which I need to know the generalizability of something, and my best shot is measuring its predictability.

For example, in small foretold/metaculus communities, I'd imagine that miscellaneous factors like "is this question interesting enough to the top 10% of forecasters" will just make the path predictability -> differential entropy -> generalizability difficult to detect.

Comment by nunosempere on 2020's Prediction Thread · 2020-01-01T15:48:33.380Z · score: 1 (1 votes) · LW · GW
No state will secede from the US. 95%

This seems underconfident?

I have different intuitions for both:

No one will have won a Nobel Prize in Physics for their work on string theory. 80%

and

No US President will utter the words "Existential risk" in public during their term as president. 65%

But this is such that I'd expect that looking into either for a couple of hours would change my mind. For the second one, the Google ngram page for existential risk is interesting, but it sadly only reaches up to the year 2008.

Comment by nunosempere on 2020's Prediction Thread · 2019-12-31T14:32:01.397Z · score: 7 (5 votes) · LW · GW

Is anyone accepting bets on their predictions?

Comment by nunosempere on What is a reasonable outside view for the fate of social movements? · 2019-01-08T11:40:03.625Z · score: 35 (7 votes) · LW · GW

My method was reading the Wikipedia page and answering the following questions:

1. Was the movement succesful as a community?

  • 0: nope
  • 1: to some extent / ambiguous.
  • 2: clearly yes.

2. Did the movement produce the change in the world which it said it wanted?

  • 0: nope
  • 1: not totally a failure / had some minor victories / ambiguous.
  • 2: clearly yes.

3. Was it succesful at changing laws? | Was that its intent?

4. Is it fringe (0), minority (1) or mainstream (2)?

5. Bias: how sympathetic am I to this movement?

  • 0: I am unsympathetic.
  • 1: I am not unsympathetic
  • 2: I like them a lot.

I feel that for the amount of effort I'm spending on this, I'm going to have to rely on my gut feeling at some point, and that the pareto principle thing to do is to have well defined questions.

In case I or someone else wants to develop this further, a way to improve on question 2 would be:

  • a) Identify the three most important objectives the movement claims to have.
  • b) For each, to what extent has it been achieved?

I excluded "Salt March" because I saw it as doublecounting "Nonviolence", and excluded "Reform movements in the United States" because it was too broad a category. I kept "Student Movements", though.

Anyways, you can find a .csv table with the results here or a Google Drive link here. I might play around with the results further, but for the moment:

Socially, the average movement does pretty well, with an average of 1.3/2, distributed as: 16% are 0s, 36% are 1s and 48% are 2s . With regards to effectiveness, the average is 0.72/2, distributed as: 44% are 0s, 40% are 1s and 16% are 2s.


Comment by nunosempere on What is a reasonable outside view for the fate of social movements? · 2019-01-07T11:47:16.187Z · score: 16 (4 votes) · LW · GW

I'm on this.

Comment by nunosempere on We Agree: Speeches All Around! · 2018-06-17T18:59:33.406Z · score: 3 (1 votes) · LW · GW

Is that talk worth listening to in full?

Comment by Radamantis on [deleted post] 2017-12-17T19:23:26.289Z

Thank you for your polite reply.

This means that the tree that falls in the forest doesn't truly make a sound because there's nobody around to have the insight that it makes a sound.

Precisely! This is really unsatisfactory. However, it is still sometimes useful to think in those terms, to not distinguish between knowledge and truth, or to ignore truth and focus on knowledge. The question "How can I find a way to rethink the following insight in terms of maps and territories?" is not rethorical, and wasn't meant as a dismissal: I really do have a hard time rephrasing something like that in terms other than that the student is beginning to grok, or beginning to develop a relationship with European History in the same way that he might develop a relationship with a friend. I understand that this might be a crutch, and therefore I asked that question.

By joint-carvey ontologies I mean ontologies that carve reality at its joints. Divisions that point at something significant.

The middle half of your commentary leaves me confused, because I don't see what prompted it.