$300 Fermi Model Competition

post by ozziegooen · 2025-02-03T19:47:09.270Z · LW · GW · 9 comments

Contents

9 comments

9 comments

Comments sorted by top scores.

comment by niplav · 2025-02-12T16:19:23.504Z · LW(p) · GW(p)

Model is here.

Background: I was thinking about the scaling-first picture and the bitter lesson and how might interpret it in two different ways:

  1. One is that deep learning is necessary and sufficient for intelligence, there's no such thing as thinking, no cleverer way to approximate Bayesian inference, no abduction etc.
  2. The other is that deep learning is sufficient for radical capabilities, superhuman intelligence, but doesn't exclude there being even smarter ways of going about performing cognition.

We have a lot of evidence about the second one, but less about the first one. Evidence for the first one takes the form of "smart humans tried for 75 years, spending ??? person-years on AI research", so I decided to use Squiggle to estimate the amount of AI research that has happened so far.

Result: 380k to 6.3M person-years, mean 1.5M.

Technique: Used hand-written squiggle code. (I didn't use AI for this one).

Replies from: niplav
comment by niplav · 2025-02-12T16:38:53.070Z · LW(p) · GW(p)

I don't know whether this will count as a separate submission (I prefer to treat these two models as one submission), but I did one more step on improving the model.

New Model is here.

Background is the same as above.

Result: Expected number of AI research years is ~150k to 5.4M years, mean 1.7M.

Technique: I pasted the original model into Claude Sonnet and asked it to suggest improvements. I then gave the original model and some hand-written suggested improvements to Squiggle AI (instructing it to add different growth modes for the AI winters and changing the variance of number of AI researchers to be lower in early years and close to the present).

Replies from: ozziegooen
comment by ozziegooen · 2025-02-14T20:20:17.151Z · LW(p) · GW(p)

That's find, we'll just review this updated model then.

We'll only start evaluating models after the cut-off date, so feel free to make edits/updates before then. In general, we'll only use the most recent version of each submitted model. 

comment by ozziegooen · 2025-02-12T03:49:22.525Z · LW(p) · GW(p)

Submissions end soon (this Sunday)! If there aren't many, then this can be an easy $300 for someone. 

comment by Steven Byrnes (steve2152) · 2025-02-03T21:12:04.792Z · LW(p) · GW(p)

I’m not sure if this is what you’re looking for, but here’s a fun little thing that came up recently I was when writing this post [LW · GW]:

Summary: “Thinking really hard for five seconds” probably involves less primary metabolic energy expenditure than scratching your nose. (Some people might find this obvious, but other people are under a mistaken impression that getting mentally tired and getting physically tired are both part of the same energy-preservation drive. My belief, see here [LW · GW], is that the latter comes from an “innate drive to minimize voluntary motor control”, the former from an unrelated but parallel “innate drive to minimize voluntary attention control”.)

Model: The net extra primary metabolic energy expenditure required to think really hard for five seconds, compared to daydreaming for five seconds, may well be zero. For an upper bound, Raichle & Gusnard 2002 says “These changes are very small relative to the ongoing hemodynamic and metabolic activity of the brain. Attempts to measure whole brain changes in blood flow and metabolism during intense mental activity have failed to demonstrate any change. This finding is not entirely surprising considering both the accuracy of the methods and the small size of the observed changes. For example, local changes in blood flow measured with PET during most cognitive tasks are often 5% or less.” So it seems fair to assume it’s <<5% of the ≈20 W total, which gives <<1 W × 5 s = 5 J. Next, for comparison, what is the primary metabolic energy expenditure from scratching your nose? Well, for one thing, you need to lift your arm, which gives mgh ≈ 0.2 kg × 9.8 m/s² × 0.4 m ≈ 0.8 J of mechanical work. Divide by maybe 25% muscle efficiency to get 3.2 J. Plus more for holding your arm up, moving your finger, etc., so the total is almost definitely higher than the “thinking really hard”, which again is probably very much less than 5 J.

Technique: As it happened, I asked Claude to do the first-pass scratching-your-nose calculation. It did a great job!

Replies from: ozziegooen, ozziegooen
comment by ozziegooen · 2025-02-03T21:22:18.897Z · LW(p) · GW(p)

By the way - I imagine you could do a better job with the evaluation prompts by having another LLM pass, where it formalizes the above more and adds more context. For example, with an o1/R1 pass/Squiggle AI pass, you could probably make something that considers a few more factors with this and brings in more stats. 

comment by ozziegooen · 2025-02-03T21:17:23.575Z · LW(p) · GW(p)

That counts! Thanks for posting. I look forward to seeing what it will get scored as. 

comment by kairos_ (samir) · 2025-02-12T06:22:34.117Z · LW(p) · GW(p)

Thanks for hosting this competition!

Fermi Estimate: How many lives would be saved if every person in the west donated 10% of their income to EA related, highly effective charities?

Model

  1. Donation Pool:
     – Assume “the West” produces roughly $40 trillion in GDP per year.
     – At a 10% donation rate, that yields about $4 trillion available annually.
  2. Rethinking Cost‐Effectiveness:
     – While past benchmarks often cite figures around $3,000 per life saved for top interventions, current estimates vary widely (from roughly $3,000 up to $20,000 per life) and only a limited pool of opportunities exists at the very low end.
     – In effect, the best interventions can only absorb a relatively small fraction of the enormous $4 trillion pool.
  3. Diminishing Returns and Saturation:
     To capture the idea that effective charity has a finite “absorption” capacity, we model the lives saved LLL as:
       L=Lmax×[1−exp⁡(−DDscale)]L = L_{\text{max}} \times \left[ 1 - \exp\left(-\frac{D}{D_{\text{scale}}}\right) \right]L=Lmax​×[1−exp(−Dscale​D​)],
     where:
      • DDD is the donation pool ($4 trillion),
      • DscaleD_{\text{scale}}Dscale​ represents the funding scale over which cost‐effectiveness declines, and
      • LmaxL_{\text{max}}Lmax​ is the maximum number of lives that can be effectively saved given current intervention opportunities.
  4.  – Based on global health data and the limited number of highly cost‐effective interventions, we set LmaxL_{\text{max}}Lmax​ in the range of about 10–15 million lives per year.
     – To reflect that the very best interventions are relatively small in total funding size, we take DscaleD_{\text{scale}}Dscale​ to be around $100 billion.
  5.  Calculating the ratio:
      DDscale=4 trillion100 billion=40\frac{D}{D_{\text{scale}}} = \frac{4\,\text{trillion}}{100\,\text{billion}} = 40Dscale​D​=100billion4trillion​=40.
     Since exp⁡(−40)\exp(-40)exp(−40) is negligibly small, we get:
      L≈LmaxL \approx L_{\text{max}}L≈Lmax​.
  6. Revised Estimate:
     Given the uncertainties, choosing a mid‐range LmaxL_{\text{max}}Lmax​ of about 12 million yields a revised Fermi estimate of roughly 12 million lives saved per year under the assumption that everyone in the West donates 10% of their yearly income to EA-related charities.

Summary 

This Fermi estimate suggests that if everyone in the West donated 10% of their yearly income to highly effective charities, we could save around 12 million lives per year. While you might think throwing $4 trillion at the problem would save way more people, the reality is that we'd quickly run into practical limits. Even the best charities can only scale up so much before they hit barriers like logistical challenges, administrative bottlenecks, and running out of the most cost-effective interventions. Still, saving 12 million lives every year is pretty mind-blowing and shows just how powerful coordinated, effective giving could be if we actually did it.

Technique

I brainstormed with Claude Sonnet for about 20 minutes, asking it to generate potential fermi questions in batches of 20. I did this a few times, rejecting most questions for being too boring or not being tractable enough, until it generated the one I used. I ran the question by o3-mini, and had to correct it's reasoning here and there until it generated a good line of reasoning. Then, I fed that output back into a different instance of o3-mini and asked it to review the fermi estimate above and point out flaws. I put that output back into the original o3-mini and it gave me the model output above.

 

-

I think a high-quality reasoning model (such as o3), combined with other LLM's that act as "critics", could generate very high quality fermi estimates. Also, LLMs can generate ideas far faster than any human can, but humans can evaluate the quality those ideas in a fraction of a second. An under explored idea is to generate dozens or hundreds of ideas using an LLM about how to solve a particular problem, and having a human do the filtering and select the best ones. I can see authors using this and telling their LLM "give me 100 interesting ways I could end this story" and picking the best one.

comment by ozziegooen · 2025-02-03T21:18:17.626Z · LW(p) · GW(p)

Related Manifold question here: