Posts

Beware boasting about non-existent forecasting track records 2022-05-20T19:20:03.854Z
Assigning probabilities to metaphysical ideas 2021-09-08T00:15:02.215Z
A quick and crude comparison of epidemiological expert forecasts versus Metaculus forecasts for COVID-19 2020-04-01T13:59:13.230Z

Comments

Comment by Jotto999 on I got dysentery so you don’t have to · 2024-10-25T17:04:06.822Z · LW · GW

Thank you for your service.

Comment by Jotto999 on How much do you believe your results? · 2023-06-14T20:25:19.696Z · LW · GW

Strong upvoted because I doubt the implications will be adequately appreciated in all of LW/EA.  Some cause ideas are astronomically noisy.  Sometimes almost deliberately, in the service of "finding" the "highest potential" areas.

Above some (unknown) point, the odds they're somehow confused/exaggerating should rise faster than further increments up the ostensible value.  I'm sure they'll claim to have sufficiently strong insight, to pull it back to the top of the EV curve.  This doesn't seem credible, even though I expect the optimal effort into those causes is >0, and even though their individual arguments are often hard to argue against.

Comment by Jotto999 on AGI in sight: our look at the game board · 2023-02-19T22:48:47.177Z · LW · GW

1. AGI is happening soon. Significant probability of it happening in less than 5 years.

[Snip]

We don’t have any obstacle left in mind that we don’t expect to get overcome in more than 6 months after efforts are invested to take it down.

Forget about what the social consensus is. If you have technical understanding of current AIs, do you truly believe there are any major obstacles left? The kind of problems that AGI companies could reliably not tear down with their resources? If you do, state so in the comments, but please do not state what those obstacles are.

"AGI" here is undefined, and so is "significant probability".  When I see declarations in this format, I downgrade my view of the epistemics involved.  Reading stuff like this makes me fantasize about not-yet-invented trading instruments, without the counterparty risk of social betting, and getting your money.

Comment by Jotto999 on In defense of the MBTI · 2023-02-07T06:07:50.894Z · LW · GW

This framing isn't meaningful, nor useful.  All 3 of those are ambiguous.

The point of any of this is to better predict human behavior, and better describe variation in behavior between people.  That's the value pitch that society might plausibly get, from taxonomizing personality tendencies.  These should be updated based on actual data, and whatever makes them more predictive.  Not just speculation.

So for example, when HEXACO began distinguishing Honesty-Humility from Agreeableness, that wasn't done because someone speculated that they thought a 6th trait made sense to them.  Including more languages in the lexical studies resulted in a 6th factor emerging from the factor analysis.  So it's a more representative depiction of the clusterings, than Big Five.

Also, e.g. H-H is more predictive of workplace deviance than the old Big Five Agreeableness trait was.  That's an example of why anyone might plausibly care about adding that 6th category.  Differentiating Disagreeableness from Dark Triad might plausibly be useful, and anyone who thinks that's useful can now use HEXACO.  Progress.

Your suggestion that we can use MBTI to "improve" Big Five is funny to people familiar with the literature.  Sticking to MBTI is going WAY back to something much more crude, and much less supported by data.  It's like saying you're going to improve 21st century agriculture with an ox and a plow.

Similarly, your proposed change to Big Five is highly unlikely to improve it.  E.g.:

So, for example, for our question about whether people naturally think in terms of what other people think about something or think in terms of how they think about things, we would have that be the extroverted thinking vs the introverted thinking cognitive function (or Te/Ti for short).

You have little reason to think this is even a good description of personality clustering.  But the behaviors are probably captured by some parts of Extroversion and Agreeableness.

I think you should just go learn about the modern personality psychology field, it's not helpful to spend time pitching improvements if you're using a framework that's 80 years behind.  We talked about this on Manifold and I think you're kind of spinning in circles, you don't need to do this -- just go learn the superior stuff and don't look back.

Comment by Jotto999 on What fact that you know is true but most people aren't ready to accept it? · 2023-02-04T02:13:30.835Z · LW · GW

I confess I don't know what it means to talk about a person's value as a soul.  I am very much in that third group I mentioned.

On an end to relative ability: is this outcome something you give any significant probability to? And if there existed some convenient way to make long-term bets on such things, what sorts of bets would you be willing to make?

Comment by Jotto999 on What fact that you know is true but most people aren't ready to accept it? · 2023-02-04T01:24:35.622Z · LW · GW

There is intense censorship of some facts of human traits, and biology.  Of the variance in intelligence and economic productivity, the percent attributable to genetic factors is >0%.  But almost nobody prestigious, semi-prestigious -- nor anything close -- can ever speak of those facts, without social shaming.  You'd probably be shamed before you even got to the question of phenotypic causation -- speaking as if the g factor exists would often suffice.  (Even though g factor is an unusually solid empirically finding, in fact I can hardly think of any more reliable one from the social sciences.)

But with all the high-functioning and prestigious people filtered out, the topic is then heavily influenced by people who have something wrong with them.  Such as having an axe to grind with a racial group.  Or people who like acting juvenile.  Or a third group that's a bit too autistic, to easily relate with the socially-accepted narratives.  I'll give you a hint: the first 2 groups rarely know enough to format the question in a meaningful way, such as "variance attributable to genes", and instead often ask "if it's genetic", which is a meaningless format.

The situation is like an epistemic drug prohibition, where the empirical insights aren't going anywhere, but nobody high-functioning or good can be the vendor.  The remaining vendors have a disproportionate number of really awful people.

I should've first learned about the Wilson effect on IQ from a liberal professor.  Instead I first heard it mentioned from some guy with an axe to grind with other groups.  I should've been conditioned with prosocial memes that don't pretend humans are exempt from the same forces that shape dogs and guppies.  Instead it's memes predicting any gaps would trend toward 0 given better controls for environment (which hasn't been the trend for many years, the recent magnitude is similar despite improving sophistication, and many interventions that didn't replicate).  The epistemics of this whole situation are egregiously dysfunctional.

I haven't read her book, but I know Kathryn Paige Harden is making an attempt.  So hats off to her.

Comment by Jotto999 on Projecting compute trends in Machine Learning · 2022-12-10T16:53:00.918Z · LW · GW

Sorry if I'm just misreading -- in Compute Trends Across Three eras of Machine Learning it was shown that the doubling time (at that time) had slowed to every ~10 months for the large-scale projects.  In this projection you go with a 6-month doubling time for x number of years, then slowing to every 20 months.  My questions are:

  1. What would the results be like if we assumed things had already slowed to 10 months?
  2. Is 6 months likely to be a better description of the upcoming computation doubling times for the next few years, versus 10 months? If yes, why?
Comment by Jotto999 on A common failure for foxes · 2022-11-12T21:38:55.453Z · LW · GW

This was a contorted and biased portrayal of the topic.  If you're a reader in a hurry, skip to my last paragraph.

First, this needs clarification on who you mean by a "fox", and who you don't.  There's a very high risk of confusion, or talking about unrelated things without noticing.  It may help if you name 5 people you consider to be foxes, and 5 you consider to be hedgehogs.

For the rest of this comment, I'm going to restrict "fox" to "good-scoring generalist forecaster", because they would tend to be quite fox-like, in the Tetlockian sense, and you did mention placing probabilities.  If there are non-forecasters you would include in your taxonomy for fox, you are welcome to mention them.  As an occasional reminder of potential confusion about this, I'll often put "fox" in quotation marks.

Paying more attention to easily-evaluated claims that don't matter much, at the expense of hard-to-evaluate claims that matter a lot.

E.g., maybe there's an RCT that isn't very relevant, but is pretty easily interpreted and is conclusive evidence for some claim. At the same time, maybe there's an informal argument that matters a lot more, but it takes some work to know how much to update on it, and it probably won't be iron-clad evidence regardless

This point has some truth to it, but it misses a lot.

When forecasters pitch ideas for questions, they tend to be interested in whether the question "really captures the spirit of the question".  Forecasters are well aware of e.g. Goodhart's Law and measurement issues, it's on our minds all the time and often discussed.  We do find it much more meaningful to forecast things that we think matter.  The format makes it possible to make progress on that.  It happens to take effort.

If a single stream of data (or criterion) doesn't adequately capture it, but if the claim actually corresponds to some future observations in any way, then you can add more questions from other angles.  By creating a "basket" from different measures, a progressively-clearer picture can be drawn.  That is, if the topic is worth the effort.

An example of this is the accumulated variety of AI-related questions on Metaculus.  Earlier attempts were famously unsatisfying, but the topic was important.  There is now a FAR better basket of measures from many angles.  And I'm sure it will continue to improve, such as by finding new ways to measure "alignment" and its precursors.

It's possible for "foxes" to actually practice this, and make the claim more evaluable.  It's a lot of work, which is why most topics don't get this.  Also this is still a very niche hobby with limited participation.  Prediction markets are literally banned.  If they weren't, they'd probably grow like an invasive weed, with questions about all sorts of things.

Although you don't explicitly say hedgehogs do a better job of including and evaluating the hard-to-evaluate claims, this seems intimately related.  The people who are better at forecasting than me tend to also be very discerning at other things we can't forecast.  In all likelihood these two things are correlated.

I'm most sympathetic to the idea that many topics have inadequate "coverage", in the sense that it's laborious to make things amenable to forecasting.  I agree lots of forecasting questions are irrelevant, or in your example, may focus on an RCT too much.

But why you think foxes seem to be worse off in this way, I don't think you really make a case.  As far as I can tell, hedgehogs get easily fixated on lots of irrelevant details all the time.  The way you describe this seems actively biased, and I'm disappointed that such a prolific poster on the site would have such a bias.

1. A desire for cognitive closure, confidence, and a feeling of "knowing things" — of having authoritative Facts on hand rather than mere Opinions.

[snip]

But real-world humans (even if they think of themselves as aspiring Bayesians) are often uncomfortable with uncertainty. We prefer sharp thresholds, capital-k Knowledge, and a feeling of having solid ground to rest on.

I found this surprising.  Hedgehogs are famously more prone to this than foxes.  Their discomfort with uncertainty (and desire for authoritative facts) tends to make them bad forecasters.

Granted, forecasters are human too, and we feel more comfortable when certain.  And it is true that we use explicit probabilities -- we do that so our beliefs are more transparent, even though it's inconvenient to us.  I can see how this relates to fixating on specific information.  We even get pretty irate when a question "resolves ambiguous", dashing our efforts like a failed replication.

But hedgehogs tend to be utterly convinced, epistemically slippery, and incredibly opinionated.  If you like having authoritative facts and feeling certainty, just be a hedgehog with One Big Idea.  And definitely stay the hell away from forecasting.

As above, this point would've been far more informative if you tried making a clear comparison against hedgehogs, and what this tends to look like in them.  Surely "foxes" can fixate on a criterion for closure, but how does this actually compare with hedgehogs? Do you actually want to make a genuine comparison?

2. Hyperbolic discounting of intellectual progress.

With unambiguous data, you get a fast sense of progress. With fuzzy arguments, you might end up confident after thinking about it a while, or after reading another nine arguments; but it's a long process, with uncertain rewards.

I don't believe you here.  Hedgehogs are free to self-reinforce in whatever direction they want, with certainty, as fast as they want.  You know what's a really slow, tedious way to feel intellectual progress? Placing a bunch of forecasts and periodically checking on them.  And being forced to tediously check potential arguments to update in various ways, which we're punished for not doing (unlike a hedgehog).  It seems far more tedious than sticking to my favorite One Big Idea.

The only way this might be true is that forecasting often focuses on short-term questions, so we can get that feedback, and also because it's much more attainable.  Though we do have lots of long-term questions too, we know they're far more difficult and we'll often be dart-throwing chimps.  But nothing about your posts seems to really deal with this.

Also a deep point that I might have already told you somewhere else, and seems like a persistent confusion, so I'm going to loudly bold it here:

Forecasters think about and aggregate lots of fuzzy things.

Let me repeat that:

Forecasters think about and aggregate lots of fuzzy things.  All the time.

We do this all the time! The substantial difference is we get later scored on whether we evaluated the fuzzy things (and also non-fuzzy-things) properly.

It's compression.  If any of those fuzzy things actually make a difference to the observable outcomes, then we actually get scored on whether we did a good job of considering those fuzzy things.  "Foxes" do this all the time, probably better than hedgehogs, on average.

I'll elaborate with a concrete example.  Suppose I vaguely overhear a nebulous rumor that Ukraine may use a dirty bomb against Russia.  I can update my forecast on that, even if I can't directly verify the rumor.  Generally you shouldn't update very much on fuzzy things though because they are very prone to being unfounded or incorrect.  In that particular example I made a small update, correctly reflecting that it's fuzzy and poorly-substantiated.  People actively get better at incorporating fuzzy things as they build a forecasting practice, we're literally scored on how well we do this.  Which Rob Bensinger would understand better if he did forecasting.

Hedgehogs are free to use fuzzy things to rationalize whatever they want, with little to slow them down, beyond the (weak and indirect) social checks they'll have on whether they considered those fuzzy things well enough.

3. Social modesty and a desire to look un-arrogant.

It can feel socially low-risk and pleasantly virtuous to be able to say "Oh, I'm not claiming to have good judgment or to be great at reasoning or anything; I'm just deferring to the obvious clear-cut data, and outside of that, I'm totally uncertain."

...To the extent I see "foxes" do this, it was usually a good thing.  Also, your wording of "totally uncertain" sounds mildly strawmanny.  They don't usually say that.  When "outside the data", people are often literally talking about unrelated things without even noticing, but a seasoned forecaster is more likely to notice this.  In such cases, they might sometimes say "I'm not sure".  Partly out of not knowing what else is being asked exactly, and partly out of genuine uncertainty

This point would be a lot more impactful if you gave examples, so we know you're not exaggerating and this is a real problem.

Collecting isolated facts increases the pool of authoritative claims you can make, while protecting you from having to stick your neck out and have an Opinion on something that will be harder to convince others of, or one that rests on an implicit claim about your judgment.

But in fact it often is better to make small or uncertain updates about extremely important questions, than to collect lots of high-confidence trivia. It keeps your eye on the ball, where you can keep building up confidence over time; and it helps build reasoning skill.

Seriously? Foxes actually make smaller updates more often than hedgehogs do.

Hedgehogs collect facts and increase the pool of authoritative claims they can make, while protecting from having to stick their necks out and risk being wrong.  Not looking wrong socially, but being actually-wrong about what happens.

This point seems just wrong-headed, as if you were actively trying to misportray the topic.

High-confidence trivia also often poses a risk: either consciously or unconsciously, you can end up updating about the More Important Questions you really care about, because you're spending all your time thinking about trivia.

Even if you verbally acknowledge that updating from the superficially-related RCT to the question-that-actually-matters would be a non sequitur, there's still a temptation to substitute the one question for the other. Because it's still the Important Question that you actually care about.

Again I appreciate it's very laborious to capture what matters into verifiable question.  If there is a particular topic that you think is missing something, please offer suggestions for new ways to capture what you believe is missing.  If that thing actually corresponds to reality in some provable way.

Overall I found this post misleading and confused.  At several points, I had no idea what you were talking about.  I suspect you're doing this because you like (some) hedgehogs, have a vested interest in their continued prestige, and want to rationalize ways that foxes are more misguided.  I think this has been a persistent feature of what you've said about this topic, and I don't think it will change.

If anyone wants to learn about this failure mode, from someone who knows what they are talking about, I highly recommend the work of David Manheim.  He's an excellent track-recorded forecaster who has done good work on Goodhart's Law, and has thought about how this relates to forecasting.

Edited to slightly change the wording emphasis, de-italicize some things that didn't really need italics, etc.

Comment by Jotto999 on AGI in our lifetimes is wishful thinking · 2022-11-01T23:44:33.647Z · LW · GW

I think many people on this site over-update on recent progress.  But I also doubt the opposite extreme you're at.

I think it's very unlikely (<10% chance) that we'll see AGI within the next 50 years, and entirely possible (>25% chance) that it will take over 500 years.

Even just evolutionary meta-algorithms would probably have runaway progress by 500 years.  That is, without humans getting super specific, deep math insights.  This is easy to imagine with the enormously higher yearly ASIC hardware fabrication we'd be seeing long before then.  I don't think a 500 year timeframe would take an unexpected math obstacle, it would take a global catastrophe.

I'd give this formulation of AGI a 93% chance of happening by 2522, and 40% by 2072.  If I could manage to submit a post before December, I'd be arguing for the Future Fund prize to update to a later timeline.  But not this much later.

Comment by Jotto999 on Have you noticed any ways that rationalists differ? [Brainstorming session] · 2022-10-23T16:54:14.008Z · LW · GW

I would love to see proper data on this.  In particular, including the facets and not just broad buckets.  Or if possible, even including findings for specific items.

The ones I've met at a meetup seemed (compared to the broader population):

-Very high in Interest in ideas, which was by far the most noticeable trend.
-Introverted
-Somewhat neurotic

Agreeableness was mixed.  Some were unfailingly agreeable, and some were starkly low in agreeableness.  Maybe data would show a clear trend on facets or items.  For the more strongly utilitarian ones, as a group, I'd speculate they are lower in Honesty-Humility from HEXACO.  Yet none ever seemed to make me "worry" in that way, as if they couldn't even manage to have Dark Triad traits without being helpful.

Comment by Jotto999 on In the very near future the internet will answer all of our questions and that makes me sad · 2022-10-22T18:08:43.872Z · LW · GW

I mean that Google themselves wouldn't want something that could get them lawsuits, and if they generate stuff, yes they'll have a selection for accuracy.  If someone is interested in AI-Dr-Oz's cures and searched for those, I'm sure Google will be happy to provide.  The market for that will be huge, and I'm not predicting that crap will go away.

Yes Google does select, now.  The ocean of garbage is that bad.  For people making genuine inquiries, often the best search providers can do right now is defer to authority websites.  If we're talking specifically about interpreting medical papers, why don't you think they'll have a selection for accuracy?

Comment by Jotto999 on In the very near future the internet will answer all of our questions and that makes me sad · 2022-10-22T15:52:21.397Z · LW · GW

In the first example it sounds like the engine is fabricating a false testimony.  Was that an intentional attribute in the example? I guess fictionalizing will happen lots, but I don't expect Google to use that particular method and jeopardize credibility.

For the second example, I assume there will be heavy selection against fabricating incorrect medical advice, at least for Google.

For genuine best-guess attempts to answer the question? I will be concerned if that doesn't happen in a few years.  What's the matter?

Comment by Jotto999 on The probability that Artificial General Intelligence will be developed by 2043 is extremely low. · 2022-10-13T02:11:34.269Z · LW · GW

That's a really good idea, changing the title.  You can also try adding a little paragraph in italics, as a brief little note for readers clarifying which proability you're giving.

Comment by Jotto999 on The probability that Artificial General Intelligence will be developed by 2043 is extremely low. · 2022-10-13T01:03:50.757Z · LW · GW

I haven't read this post, I just wanted to speculate about the downvoting, in case it helps.

Assigning "zero" probability is an infinite amount of error.  In practice you wouldn't be able to compute the log error.  More colloquially, you're infinitely confident about something, which in practice and expectation can be described as being infinitely wrong.  Being mistaken is inevitable at some point.  If someone gives 100% or 0%, that's associated with them being very bad at forecasting.

I expect a lot of the downvotes are people noticing that you gave it 0%, and that's strong evidence you're very uncalibrated as a forecaster.  For what it's worth, I'm in the highscores on Metaculus, and I'd interpret that signal the same way.

Skimming a couple seconds more, I suspect the overall essay's writing style doesn't really explain how the material changes our probability estimate.  This makes the essay seem indistinguishable from confused/irrelevant arguments about the forecast.  For example if I try skim reading the Conclusion section, I can't even tell if the essay's topics really change the probability that human jobs can be done by some computer for $25/hr or less (that's the criteria from the original prize post).

I have no reason not to think you were being genuine, and you are obviously knowledgeable.  I think a potential productive next step could be if you consulted someone with a forecasting track record, or read Philip Tetlock's stuff.  The community is probably reacting to red flags about calibration, and (possibly) a writing style that doesn't make it clear how this updates the forecast.

Comment by Jotto999 on It’s Probably Not Lithium · 2022-07-05T06:11:03.320Z · LW · GW

Why "ecologically realistic food"? And which types of realism are you going to pick?

Overfeeding and obesity are common problems in pets, which are mostly not bred to gain weight the way cows are.

My family has kept many kinds of animals.  If you give bunny rabbits as much veggies as they want, a large fraction becomes obese.  And guinea pigs too.  And for their own favorite foods, tropical fish do too.  Cats too.

In fact, I have never noticed a species that doesn't end up with a substantial fraction with obesity, if you go out of your way to prepare the most-compelling food to them, and then give that in limitless amounts.  Even lower-quality, not-as-compelling foods free-fed can cause some obesity.  Do you even know of any animal species like this?!

If there is large variation in susceptibility (which there would be) to the ostensible environmental contaminant, there should be species that you can free-feed and they don't get obesity.

Comment by Jotto999 on It’s Probably Not Lithium · 2022-07-04T15:30:32.233Z · LW · GW

Do you have any empirical evidence for either of the following?

  1. Farmers were historically wrong to think that free-feeding their animals would tend to fatten them up, OR they didn't believe it has that effect.
  2. Prior to the more recent novel contaminants, humans are an exception among animals in this general trend, that free-feeding tends to fatten animals up.
Comment by Jotto999 on It’s Probably Not Lithium · 2022-07-03T18:44:56.280Z · LW · GW

I'm going to bury this a bit deeper in the comment chain because it's no more indicative than Eliezer's anecdote.  But FWIW,

I am in the (very fortunate) minority who struggles to gain much weight, and has always been skinny.  But when I have more tasty food around, especially if it's prepared for me and just sitting there, I absolutely eat more, and manage to climb up from ~146 to ~148 or ~150 (pounds).  It's unimaginable that this effect isn't true for me.

Comment by Jotto999 on Yudkowsky and Christiano discuss "Takeoff Speeds" · 2022-06-30T04:05:50.796Z · LW · GW

I no longer see exchanges with you as a good use of energy, unless you're able to describe some of the strawmanning of me you've done and come clean about that.

EDIT: Since this is being downvoted, here is a comment chain where Rob Besinger interpreted me in ways that are bizarre, such as suggesting that I think Eliezer is saying he has "a crystal ball", or that "if you record any prediction anywhere other than Metaculus (that doesn't have similarly good tools for representing probability distributions), you're a con artist".  Things that sound thematically similar to what I was saying, but were weird, persistent extremes that I don't see as good-faith readings of me.  It kept happening over Twitter, then again on LW.  At no point have I felt he's trying to understand what I actually think.  So I don't see the point of continuing with him.

Comment by Jotto999 on Yudkowsky and Christiano discuss "Takeoff Speeds" · 2022-06-29T02:18:51.858Z · LW · GW

I see what you're saying, but it looks like you're strawmanning me yet again with a more extreme version of my position.  You've done that several times and you need to stop that.

What you've argued here prevents me from questioning the forecasting performance of every pundit who I can't formally score, which is ~all of them.

Yes, it's not a real forecasting track record unless it meets the sort of criteria that are fairly well understood in Tetlockian research.  And neither is Ben Garfinkel's post, that doesn't give us a forecasting track record, like on Metaculus.

But if a non-track-recorded person suggests they've been doing a good job anticipating things, it's quite reasonable to point out non-scorable things they said that seem incorrect, even with no way to score it.

In an earlier draft of my essay, I considered getting into bets he's made (several of which he's lost). I ended up not including those things.  Partly because my focus was waning and it was more attainable to stick to the meta-level point.  And partly because I thought the essay might be better if it was more focused.  I don't think there is literally zero information about his forecasting performance (that's not plausible), but it seemed like it would be more of a distraction from my epistemic point.  Bets are not as informative as Metaculus-style forecasts, but they are better than nothing.  This stuff is a spectrum, even Metaculus doesn't retain some kinds of information about the forecaster.  Still, I didn't get into it, though I could have.

But I ended up later editing in a link to one of Paul's comments, where he describes some reasons that Robin looks pretty bad in hindsight, but also includes several things Eliezer said that seem quite off.  None of those are scorable.  But I added in a link to that, because Eliezer explicitly claimed he came across better in that debate, which overall he may have, but it's actually more mixed than that, and that's relevant to my meta-point that one can obfuscate these things without a proper track record.  And Ben Garfinkel's post is similarly relevant.

If the community felt more ambivalently about Eliezer's forecasts, or even if Eliezer was more ambivalent about his own forecasts? And then there was some guy trying to convince people he has made bad forecasts? Then your objection of one-sidedness would make much more sense to me.  That's not what this is.

Eliezer actively tells people he's anticipating things well, but he deliberately prevents his forecasts from being scorable.  Pundits do that too, and you bet I would eagerly criticize vague non-scorable stuff they said that seems wrong.  And yes, I would retweet someone criticizing those things too.  Does that also bother you?

Comment by Jotto999 on Where I agree and disagree with Eliezer · 2022-06-25T18:24:22.001Z · LW · GW
  1. I disagree with the community on that.  Knocking out silver turing, Montezuma (in the way described), 90% equivalent on Winogrande, and 75th percentile on maths SAT will either take longer to be actually demonstrated in a unified ML system, OR it will happen way sooner than 39 months before "an AI which can perform any task humans can perform in 2021, as well or superior to the best humans in their domain.", which is incredibly broad.  If the questions mean what they are written to mean, as I read them, it's a hell of a lot more than 39 months (median community estimate).
  2. The thing I said is about some important scenarios described by people giving significant probability to a hostile hard takeoff scenario.  I included the comment here in this subthread because I don't think it contributed much to the discussion.
Comment by Jotto999 on Where I agree and disagree with Eliezer · 2022-06-20T23:49:29.958Z · LW · GW

Very broadly,

in 2030 it will still be fairly weird and undersubstantiated, to say that a dev's project might accidentally turn everyone's atoms into ML hardware, or might accidentally cause a Dyson sphere to be build.

Comment by Jotto999 on On A List of Lethalities · 2022-06-14T03:16:17.843Z · LW · GW

I haven't read most of the post.  But in the first few paragraphs, you mention how he was ranting, and you interpret that as an upward update on the risk of AI extinction:

The fact that this is the post we got, as opposed to a different (in many ways better) post, is a reflection of the fact that our Earth is failing to understand what we are facing. It is failing to look the problem in the eye, let alone make real attempts at solutions.

But that's extremely weak evidence.  People rant all the time, including while being incorrect.  Him formatting a message as a rant isn't evidence of an increased risk of doom compared to yesterday, unless you already agree with him.

Comment by Jotto999 on Biology-Inspired AGI Timelines: The Trick That Never Works · 2022-06-05T14:38:54.352Z · LW · GW

My posting this comment will be contrary to the moderation disclaimer advising not to talk about tone.  But FWIW, I react similarly and I skip reading things written in this way, interpreting them as manipulating me into believing the writer is hypercompetent.

Comment by Jotto999 on What an actually pessimistic containment strategy looks like · 2022-06-04T14:25:13.072Z · LW · GW

Most animals aren't in a factory farm.

Comment by Jotto999 on What an actually pessimistic containment strategy looks like · 2022-06-04T13:09:52.202Z · LW · GW

Yes, but I predict it will end up applying to most non-humans too.

Comment by Jotto999 on What an actually pessimistic containment strategy looks like · 2022-06-04T12:06:10.023Z · LW · GW

I predict that most creatures disagree with you, if an honest poll about themselves was done, and not about some far abstraction of other people.  (EDIT: Link is about humans, but I predict most non-humans also prefer to be alive and aren't better off dead.)

Which is also my prior on the attitude of "it's fine if everyone dies" people.  Of historical cases where someone thought that, few people agreed and we end up glad they didn't get their way.  I'm sure it's the same all over again here with you, and some other people I've heard express this attitude.

Comment by Jotto999 on Beware boasting about non-existent forecasting track records · 2022-05-28T00:28:49.382Z · LW · GW

This is good in some ways but also very misleading.  This selects against people who also place a lot of forecasts on lots of questions, and also against people who place forecasts on questions that have already been open for a long time, and who don't have time to later update on most of them.

I'd say it's a very good way to measure performance within a tournament, but in the broader jungle of questions it misses an awful lot.

E.g. I have predictions on 1,114 questions, and the majority were never updated, and had negligible energy put into them.

Sometimes for fun I used to place my first (and only) forecast on questions that were just about to close.  I liked it because this made it easier to compare my performance on distribution questions, versus the community, because the final summary would only show that for the final snapshot.  But of course, if you do this then you will get very few points per question.  But if I look at my results on those, it's normal for me to slightly outperform the community median.

This isn't captured by my average points per question across all questions, where I underperform (partly because I never updated on most of those questions, and partly because a lot of it is amusingly obscure stuff I put little effort into.)  Though, that's not to suggest I'm particular great either (I'm not), but I digress.

If we're trying to predict a forecaster's insight on "the next" given discrete prediction, then a more useful metric would be the forecaster's log score versus the community's log score on the same questions, at the time they placed those forecast.  Naturally this isn't a good way to score tournaments, where people should update often, and focus on high-effort per question.  But if we're trying to estimate their judgment from the broader jungle of Metaculus questions, then that would be much more informative than a points average per question.

Comment by Jotto999 on Beware boasting about non-existent forecasting track records · 2022-05-26T01:04:43.043Z · LW · GW

Some more misunderstanding:

'if you record any prediction anywhere other than Metaculus (that doesn't have similarly good tools for representing probability distributions), you're a con artist'. Seems way too extreme.

No, I don't mean the distinguishing determinant of con-artist or not-con-artist trait is whether it's recorded on Metaculus.  It's mentioned in that tweet because if you're going to bother doing it, might as well go all the way and show a distribution.

But even if he just posted a confidence interval, on some site other than Metaculus, that would be a huge upgrade.  Because then anyone could add it to a spreadsheet scorable forecasts, and reconstruct it without too much effort.

'if you record any prediction anywhere other than Metaculus (that doesn't have similarly good tools for representing probability distributions), you're a con artist'. Seems way too extreme.

No, that's not what I'm saying.  The main thing is that they be scorable.  But if someone is going to do it at all, then doing it on Metaculus just makes more sense -- the administrative work is already taken care of, and there's no risk of cherry-picking nor omission.

Also, from another reply you gave:

Also, I think you said on Twitter that Eliezer's a liar unless he generates some AI prediction that lets us easily falsify his views in the near future? Which seems to require that he have very narrow confidence intervals about very near-term events in AI.

I never used the term "liar".  The thing he's doing that I think is bad is more like what a pundit does, like the guy who calls recessions, a sort of epistemic conning.  "Lying" is different, at least to me.

More importantly, no he doesn't necessarily need to have really narrow distributions, and I don't know why you think this.  Only if he was squashed close against the "Now" side on the chart, then yes it would be "narrower" -- but if that's what Eliezer thinks, if he's saying himself it's earlier than x date, then on a graph that looks like it's a bit narrower and shifted to the left, and it simply reflects what he believes.

There's nothing about how we score forecasters that requires him have "very narrow" confidence intervals about very near-term events in AI, in order to measure alpha.  To help me understand, can you describe why you think this? Why don't you think alpha would start being measurable with merely slightly more narrow confidence intervals than the community, and centered closer to the actual outcome?

EDIT a week later: I have decided that several of your misunderstandings should be considered strawmanning, and I've switched from upvoting some of your comments here to downvoting them.

Comment by Jotto999 on Beware boasting about non-existent forecasting track records · 2022-05-25T02:25:47.990Z · LW · GW

First I commend the effort you're putting into responding to me, and I probably can't reciprocate as much.

But here is a major point I suspect you are misunderstanding:

It seems like you're interpreting EY as claiming 'I have a crystal ball that gives me unique power to precisely time AGI', whereas I interpret EY as saying that one particular Metaculus estimate is wrong.

This is neither necessary for my argument, nor at any point have I thought he's saying he can "precisely time AGI".

If he thought it was going to happen earlier than the community, it would be easy to show an example distribution of his, without high precision (nor much effort).  Literally just add a distribution into the box on the question page, click and drag the sliders so it's somewhere that seems reasonable to him, and submit it.  He could then screenshot it.  Even just copypasting the confidence interval figures.

Note that this doesn't mean making the date range very narrow (confident), that's unrelated.  He can still be quite uncertain about specific times.  Here's an example of me somewhat disagreeing with the community.  Of course now the community has updated to earlier, but he can still do these things, and should.  Doesn't even need to be screenshotted really, just posting it in the Metaculus thread works.

And further, this point you make here:

Eliezer hasn't said he thinks he can do better than Metaculus on arbitrary questions. He's just said he thinks Metaculus is wrong on one specific question.

My argument doesn't need him to necessarily be better at "arbitrary questions".  If Eliezer believes Metaculus is wrong on one specific question, he can trivially show a better answer.  If he does this on a few questions and it gets properly scored, that's a track record.

You mentioned other things, such as how much it would transfer to broader, longer-term questions.  That isn't known and I can't stay up late typing about this, but at the very minimum people can demonstrate they are calibrated, even if you believe there is zero knowledge transfer from narrower/shorter questions to broader/longer ones.

Going to have to stop it there for today, but I would end this comment with a feeling: it feels like I'm mostly debating people who think they can predict when Tetlock's findings don't apply, and so reliably that it's unnecessary to forecast properly nor transparently, and it seems like they don't understand.

Comment by Jotto999 on Beware boasting about non-existent forecasting track records · 2022-05-23T11:32:38.218Z · LW · GW

You're right that volatility is an additional category of reasons that him not giving his actual distribution makes it less informative.

It's interesting to me that in his comment, he states:

I do admit, it's not a good look that I once again understate my position by so much compared to what the reality turns out to be, especially after having made that mistake a few times before.

He sees it as significant evidence that his position wasn't extreme enough.  But he didn't even clearly given his position, and "the reality" is a thing that is determined by the specific question resolution when that day comes.  Instead of that actual reality, and because of how abruptly the community ended up shifting, Eliezer seems to be interpreting that to mean that his position about that reality is not extreme enough.  Those 2 things are somewhat related but pretty weakly, so it seems like rationalizing for him to frame it as showing his forecast isn't extreme enough.

I don't expect him to spend time engaging with me, but for what it's worth, to me the comment he wrote here doesn't address anything I brought up, it's essentially just him restating that he interprets this as a nice addition to his "forecasting track record".  He certainly could have made it part of a meaningful track record! It was a tantalizing candidate for such a thing, but he doesn't want to, but expects people to just interpret it the same, which doesn't make sense.

Comment by Jotto999 on Beware boasting about non-existent forecasting track records · 2022-05-22T21:08:21.856Z · LW · GW
Comment by Jotto999 on Beware boasting about non-existent forecasting track records · 2022-05-22T20:24:00.339Z · LW · GW

These sorts of observations sound promising for someone's potential as a forecaster.  But by themselves, they are massively easier to cherry-pick, fudge, omit, or re-define things, versus proper forecasts.

When you see other people make non-specific "predictions", how do you score them? How do you know the scoring that you're doing is coherent, and isn't rationalizing? How do you avoid the various pitfalls that Tetlock wrote about? How do you *ducks stern glance* score yourself on any of that, in a way that you'll know isn't rationalizing?

For emphasis, in this comment you reinforce that you consider it a successful advance prediction.  This gives very little information about your forecasting accuracy.  We don't even know what your actual distribution is, and it's a long time before this resolves, we only know it went in your direction.  I claim that to critique other people's proper-scored forecasts, you should be transparent and give your own.

EDIT: Pasted from another comment I wrote:

Instead of that actual [future resolution] reality, and because of how abruptly the community ended up shifting, Eliezer seems to be interpreting that to mean that his position about that reality is not extreme enough.  Those 2 things are somewhat related but pretty weakly, so it seems like rationalizing for him to frame it as showing his forecast isn't extreme enough.

Comment by Jotto999 on Beware boasting about non-existent forecasting track records · 2022-05-22T14:29:51.068Z · LW · GW

No, success and fame are not very informative about forecasting accuracy.  Yes they are strongly indicative of other competencies, but you shouldn't mix those in with our measure of forecasting.  And nebulous unscorable statements don't at all work as "success", too cherry-picked and unworkable.  Musk is famously uncalibrated with famously bad timeline predictions in his domain! I don't think you should be glossing over that in this context by saying "Well he's successful..."

If we are talking about measuring forecasting performance, then it's more like comparing tournament Karate with trench warfare.

Comment by Jotto999 on Beware boasting about non-existent forecasting track records · 2022-05-22T14:13:40.786Z · LW · GW

This is actually a testable claim: that forecasts ends up trailing things that Eliezer said 10 years later.

Not really, unless you accept corruptible-formats of forecasts with lots of wiggle room.  It isn't true that we can have a clear view of how he is forecasting if he skips proper forecasting.

I think you're right that it's impressive he alerted people to potential AI risks.  But if you think that's an informative forecasting track record, I don't think that heuristic is remotely workable in measuring forecasters.

Doing a precise prediction when you don't have the information I feel like there's been a lot of misunderstanding about why Eliezer doesn't want to give timeline predictions, when he said it repeatedly: he thinks there is just not enough bits of evidence for making a precise prediction. There is enough evidence to be pessimistic, and realize we're running out of time, but I think he would see giving a precise year like a strong epistemic sin. Realize when you have very little evidence, instead of inventing some to make your forecast more concrete.

To clarify, I'm not saying he should give a specific year that he thinks it happens, like such a 50% confidence interval of 12 months.  That would be nuts.  Per Tetlock, it just isn't true that you can't (or shouldn't) give specific numbers when you are uncertainty.  You just give a wider distribution.  And not giving that unambiguous distribution when you're very uncertain just obfuscates, and is the real epistemic sin.

As for the financial pundit example, there's a massive disanalogy: it's easy to predict that there will be a crash. Everybody does it, we have past examples to generalize from, and models and theories accepted by a lot of people for why they might be inevitable. On the other hand, when Eliezer started talking about AI Risks and investing himself fully in them, nobody gave a shit about it or took it seriously. This was not an obvious prediction that everyone was making, and he gave far more details than just saying "AI Risks, man".

I don't understand what you mean by the bolded part.  What do you mean everybody does it? No they don't.  Some people pretend to, though.  The analogy is relevant in the sense that Eliezer should show that he is calibrated at predicting AI risks, rather than only arguing so.  The details you mention don't work as a proper forecasting track record.

Comment by Jotto999 on Beware boasting about non-existent forecasting track records · 2022-05-22T13:53:25.510Z · LW · GW

An analogy could be Elon Musk.  He's done great things that I personally am absolutely incapable of.  And he does deserve praise for those things.  And indeed, Eliezer was a big influence on me.  But he gives extreme predictions that probably won't age well.

Him starting this site and writing a million words about rationality is wonderful and outstanding.  But do you think it predicts forecasting performance nearly as well as proper forecasting actual performance? I claim it doesn't come anywhere near as good of a predictive factor than just making some actual forecasts and seeing what happens, and I don't see the opposing position holding well at all.  You can argue that "we care about other things too than just forecasting ability" but in this thread I am specifically referring to his implied forecasting accuracy, not his other accomplishments.  The way you're referring to Bayes points here doesn't seem workable or coherent, any more than Musk Points tell me his predictions are accurate.

Comment by Jotto999 on Beware boasting about non-existent forecasting track records · 2022-05-22T13:40:30.219Z · LW · GW

You're right about the definition of fearmongering then.  I think he clearly tries to make people worried, and I often find it unreasonable.  But I don't expect everyone to think he meets the "unreasonable" criterion.

On the second quote in your top comment: indeed, most scored forecasters with a good track record don't give 25% risk of extinction, say, before e.g. 2200.

And as for 99%: this is wackadoodle wildly extreme, and probably off by a factor of roughly ~1,000x in odds format.  If I assume the post's implied probability is actually closer to 99%, then it seems egregious.  You mention these >25% figures are not that out of place for MIRI, but what does that tell us? This domain probably isn't that special, and humans would need to be calibrated forecasters for me to care much about their forecasts.

Here are some claims I stand by:

I genuinely think the pictured painted by that post (and estimates near 99%) are overstating the odds of extinction soon by a factor of roughly ~1,000x.  (For intuition, that's similar to going from 10% to 99%.)

I genuinely think these extreme figures are largely coming from people who haven't demonstrated calibrated forecasting, which would make it additionally suspicious in any other domain, and should here too.

I genuinely think Eliezer does something harmful by overstating the odds, by an amount that isn't reasonable.

I genuinely think it's bad of him to criticize other proper-scored forecasts without being transparent about his own, so a fair comparison could be made.

 

On insults

This part I've moved to the bottom of this comment because I think it's less central to the claim I'm making.  For the criteria for "insulting" or sneering, well, a bunch of people (including me) found it like that.  Some people I heard from described it as infuriating that he was saying these things without being transparent about his own forecasts.  And yes, the following does seem to imply other people aren't sane nor self-respecting:

To be a slightly better Bayesian is to spend your entire life watching others slowly update in excruciatingly predictable directions that you jumped ahead of 6 years earlier so that your remaining life could be a random epistemic walk like a sane person with self-respect.

Putting aside whether or not you think I have an axe to grind, don't you see how some people would see that as insulting or sneering?

Comment by Jotto999 on Beware boasting about non-existent forecasting track records · 2022-05-22T01:21:07.888Z · LW · GW

This is a great list and I thank you for describing it.  Good examples of one of the claims I'm making -- there's nothing about their debate that tells us much meaningful about Eliezer's forecasting track record.  In fact I would like to link to this comment in the original post because it seems like important supplemental material, for people who are convinced that the debate was one-sided.

Comment by Jotto999 on Beware boasting about non-existent forecasting track records · 2022-05-22T00:54:37.373Z · LW · GW

That doesn't seem very relevant -- they can have criticisms of a platform without necessarily doing forecasting.  Also, a brier score on its own doesn't tell us much meaningful without comparing with other people, due to how much different questions can vary in difficulty.

Comment by Jotto999 on Beware boasting about non-existent forecasting track records · 2022-05-22T00:40:34.008Z · LW · GW

This isn't a good description of being on Metaculus versus being a Bayesian.

How does one measure if they are "being a Bayesian"? The general point is you can't, unless you are being scored.  You find out by making forecasts -- if you aren't updating you get fewer points, or even lose points.  Otherwise you have people who are just saying things that thematically sound Bayesian but don't mean very much in terms of updated believes.  Partly I'm making an epistemic claim that Eliezer can't actually know if he's being a good Bayesian, without proper forecasting.  You can check out Tetlock's work if you're unsure why that would be the case, though I mention it in the post.

The more central epistemic claim I'm making in this essay: if someone says they are doing a better job of forecasting a topic than other people, but they aren't actually placing forecasts so we could empirically test if they are, then that person's forecasts should be held in high suspicion.  I'm claiming this would be the same in every other domain, and AI timelines are unlikely to be that special, and his eminence doesn't really buy a good justification why we would hold him to drastically lower standards about measuring his forecast accuracy.

Comment by Jotto999 on Beware boasting about non-existent forecasting track records · 2022-05-22T00:29:48.937Z · LW · GW

A couple questions:

  1. It's quite easy and common to insult groups of people.  And me and some other people found him very sneering in that post.  In order to count as "calling them excruciatingly predictable", it seems like you're suggesting Eliezer would have had to be naming specific people, and that it doesn't count if it's about a group (people who had placed forecasts in that question)? If yes, why?
  2. For that post that I described as fearmongering, it's unrelated whether his "intention" is fearmongering or not.  I would like if you elaborated.  The post has a starkly doomsday attitude.  We could just say it's an April Fool's joke, but the problem with this retort is Eliezer has said quite a few things with a similar attitude.  And in the section "addressing" whether it's an April Fool's joke he first suggests that it is, but then implies that he intends for the reader to take the message very seriously so not really.

    Roughly, the post seems to imply a chance of imminent extinction that is, like, a factor of ~100x higher (in odds format) than what scored aggregated forecasters roughly give.  Such an extreme prediction could indeed be described as fearmongering.

    In order to count as "fearmongering", are you saying he would've had to meet the requirement of being motivated specifically for fearmongering? Because that's what your last sentence suggests.
Comment by Jotto999 on Beware boasting about non-existent forecasting track records · 2022-05-22T00:02:46.044Z · LW · GW

Upvoted -- I agree that that bet they made should be included in the post, around when I'm mentioning how Eliezer told Paul he doesn't have "a track record like this".  He did decline to bet Paul in that comment I am quoting, claiming Paul "didn't need to bet him".  But you are right that it's wrong not to include the bet they do have going, and not just the one with Bryan Caplan.  I've made some edits to include it.

Comment by Jotto999 on Beware boasting about non-existent forecasting track records · 2022-05-21T17:51:05.952Z · LW · GW

I agree about ems being nowhere in sight, versus steady progress in other methods.  I also disagree with Hanson about timeframe (I don't see it taking 300 years).  I also agree that general algorithms will be very important, probably more important than Hanson said.  I also put a lower probability on a prolonged AI winter than Hanson.

But as you said, AGI still isn't here.  I'd take it a step further -- did the Hanson debates even have unambiguous, coherent ideas of what "AGI" refers to?

Of progress toward AGI, "how much" happened since the Hanson debate? This is thoroughly nebulous and gives very little information about a forecasting track record, even though I disagree with Hanson.  With the way Eliezer is positioned in this debate, he can just point to any impressive developments, and say that goes in his favor.  We have practically no way of objectively evaluating that.  If someone already agrees "the event happened", they update that Eliezer got it right.  If they disagree, or if they aren't sure what the criteria were, they don't.

Being able to say post-hoc say that Eliezer "looks closer to the truth" is very different from how we measure forecasting performance, and for good reason.  If I was judging this, the "prediction" absolutely resolves as "ambiguous", despite me disagreeing with Hanson on more points in their debate.

Comment by Jotto999 on Beware boasting about non-existent forecasting track records · 2022-05-21T06:01:00.321Z · LW · GW

Several different tough hurdles have to be passed, and usually aren't.  For one, they would have to both even agree on criteria that they both think are relevant enough, and that they can define well-enough for it to be resolvable.

They also have to agree to an offer with whatever odds, and the amount of money.

They then also have to be risk-tolerant enough to go through knowing they may lose money, or may be humiliated somewhat (though with really good betting etiquette, IMO it need not humiliating if they're good sports about it).  And also the obvious counterparty risk, as people may simply not pay up.

Comment by Jotto999 on Beware boasting about non-existent forecasting track records · 2022-05-21T05:56:12.591Z · LW · GW

That's good feedback.  I can see why the wording I used gives the wrong impression -- he didn't literally say out loud that he has "a great forecasting track record".  It still seems to me heavily implied by several things he's said, especially what he said to Paul.

I think the point you raise is valid enough.  I have crossed out the word "claimed" in the essay, and replaced it with "implied".

Comment by Jotto999 on Beware boasting about non-existent forecasting track records · 2022-05-21T05:27:18.338Z · LW · GW

Well, when he says something like this:

I claim that I came off better than Robin Hanson in our FOOM debate compared to the way that history went.  I'd claim that my early judgments of the probable importance of AGI, at all, stood up generally better than early non-Yudkowskian EA talking about that.

...He's saying something notably positive about some sort of track record.  That plus the comments he made about the Metaculus updates, and he clearly thinks he's been doing well.  Yes, he doesn't have a track record on Metaculus (I'm not even aware of him having a profile).  But if I just read what he writes and see what he's implying, he thinks he's doing much better at predicting events than somebody, and many of those somebodys seem to be people closer to Hanson's view, and also seem to be Metaculus predictors.

Also, perhaps I'm using the word "great" more informally than you in this context.

Comment by Jotto999 on Beware boasting about non-existent forecasting track records · 2022-05-21T03:09:21.213Z · LW · GW

That might be possible, but it would take a lot of effort to make resolvable.  Who is "similar to Eliezer", and how do we define that in advance? Which forecasting questions are we going to check their Brier score on? Which not-like-Eliezer forecasters are we going to compare them to (since scores are much more informative for ranking forecasters, than in isolation)? Etc.  I'd rather people like Eliezer just placed their forecasts properly!

Comment by Jotto999 on Beware boasting about non-existent forecasting track records · 2022-05-21T02:19:57.539Z · LW · GW

I like your list!

Definitely agree that narrow questions can lose the spirit of it.  The forecasting community can hedge against this by having a variety of questions that try to get at it from "different angles".

For example, that person in 1970 could set up a basket of questions:

  1. Percent of GDP that would be computing-related instead of rocket-related.
  2. Growth in the largest computer by computational power, versus the growth in the longest distance traveled by rocket, etc.
  3. Growth in the number of people who had flown in a rocket, versus the number of people who own computers.
  4. Changes in dollars per kilo of cargo hauled into space, versus changes in FLOPS-per-dollar.

Of course, I understand completely if people in 1970 didn't know about Tetlock's modern work.  But for big important questions, today, I don't see why we shouldn't just use modern proper forecasting technique.  Admittedly it is laborious! People have been struggling to write good AI timeline questions for years.

Comment by Jotto999 on Beware boasting about non-existent forecasting track records · 2022-05-21T02:02:57.316Z · LW · GW

I'm fond of "x percent of sun's energy used"-syle stuff because I would expect runaway superintelligence to probably go ahead and use that energy, and it has a decent shot at being resolvable.

But I think we need to be careful about assuming all the crazy-ambitious milestones end up only a few months from each other.  You could have a situation where cosmic industrialization is explosively fast heading away from Earth, with every incentive to send out seed ships for a land-grab.  But it's plausible that could be going on despite here on Earth it's much slower, if there are some big incumbents that maintain control and develop more slowly.  I'm not super sure that'll happen but it's not obvious that all the big milestones happen within a few months of each other, if we assume local control is maintained and the runaway Foom goes elsewhere.

This is an example of why I think it does matter what milestone people pick, but it will often be for reasons that are very hard to foresee.

Comment by Jotto999 on Beware boasting about non-existent forecasting track records · 2022-05-21T00:54:34.932Z · LW · GW

Ah by the way, I think the link you posted accidentally links to this post.

Comment by Jotto999 on Beware boasting about non-existent forecasting track records · 2022-05-21T00:42:18.996Z · LW · GW

It tells us essentially nothing.  How are you going to score the degree to which it turned out to be "a sign of real progress towards AGI"? I understand it feels impressive but it's far too nebulous to work as a forecasting track record.