Posts

Contra papers claiming superhuman AI forecasting 2024-09-12T18:10:50.582Z
Unit economics of LLM APIs 2024-08-27T16:51:22.692Z
[EAForum xpost] A breakdown of OpenAI's revenue 2024-07-10T18:09:20.017Z
[EA xpost] The Rationale-Shaped Hole At The Heart Of Forecasting 2024-04-02T17:40:44.278Z
Metaculus is seeking Software Engineers 2022-11-05T00:42:24.909Z

Comments

Comment by dschwarz on Growing Up is Hard · 2024-12-26T04:25:41.272Z · LW · GW

9 years since the last comment - I'm interested in how this argument interacts with GPT-4 class LLMs, and "scale is all you need".

Sure, LLMs are not evolved in the same way as biological systems, so the path towards smarter LLMs aren't fragile in the way brains are described in this article, where maybe the first augmentation works, but the second leads to psychosis.

But LLMs are trained on writing done by biological systems with intelligence that was evolved with constraints.

So what does this say about the ability to scale up training on this human data in an attempt to reach superhuman intelligence?

Comment by dschwarz on Contra papers claiming superhuman AI forecasting · 2024-09-12T18:56:46.179Z · LW · GW

Thank you for the careful look into data leakage in the other thread! Some of your findings were subtle, and these are very important details.

Comment by dschwarz on AI forecasting bots incoming · 2024-09-12T18:34:16.548Z · LW · GW

Instead of writing a long comment, we wrote a separate post that, like @habryka and Daniel Halawi did, looks into this carefully.  We re-read all 4 papers making these misleading claims this year and show our findings on how they're falling short.

https://www.lesswrong.com/posts/uGkRcHqatmPkvpGLq/contra-papers-claiming-superhuman-ai-forecasting 

Comment by dschwarz on [EAForum xpost] A breakdown of OpenAI's revenue · 2024-07-11T17:17:53.684Z · LW · GW

Good point.  For this public report, we manually checked all the data points that were included here. FutureSearch threw out many other unreliable data points it couldn't corroborate, that's a core part of what it does.

The sources linked here are low quality data brokers due to a bug - there is a higher quality data source corroborating it, but FutureSearch doesn't cite the higher quality one. 

We're working on fixing this, and identifying all primary vs. secondary sources.

Comment by dschwarz on [EAForum xpost] A breakdown of OpenAI's revenue · 2024-07-11T17:14:46.961Z · LW · GW

All of the research was done by FutureSearch, so AI, with a few exceptions, such as https://app.futuresearch.ai/reports/3Li1?nodeId=MIw9, where it couldn't infer good team/enterprise ratios from analogous products where numbers were reliable. Estimating ChatGPT Teams subscribers was the hardest part, requiring the most judgment.

Most of the final words in the report were written or revised by humans. We put a high quality bar on this to publish it publicly, and did more human intervention than normal.
 

Comment by dschwarz on [EA xpost] The Rationale-Shaped Hole At The Heart Of Forecasting · 2024-04-04T14:01:02.361Z · LW · GW

(Responded to the version of this on the EA Forum post.)

Comment by dschwarz on Are language models good at making predictions? · 2023-11-06T14:19:08.790Z · LW · GW

Great post!

| Manifold markets that were resolved after GPT-4’s current knowledge cutoff of Jan 1, 2022

Were you able to verify that newer knowledge didn't bleed in? Anecdotally GPT-4 can report various different cutoff dates, depending on the API. And there is anecdotal evidence that GPT-4-0314 occasionally knows about major world events after its training window, presumably from RLHF?

This could explain the better scores on politics than science.

Comment by dschwarz on Some research ideas in forecasting · 2022-11-15T23:32:34.215Z · LW · GW

Nice post! I'll throw another signal boost for the Metaculus hackathon that OP links, since this is the first time Metaculus is sharing their whole 1M db of individual forecasts (not just the db of questions & resolutions which is already available). You have to apply to get access though. I'll link it again even though OP already did: https://metaculus.medium.com/announcing-metaculuss-million-predictions-hackathon-91c2dfa3f39

There are nice cash prizes too.

As the OP writes, I think most the ideas here would be valid entries in the hackathon, though the emphasis is on forecast aggregation & methods for scoring individuals. I'm particularly interested in decay of predictions idea. I don't think we know how well predictions age, and what the right strategy for updating your predictions should be for long-running questions.

Comment by dschwarz on Human errors, human values · 2011-04-13T02:49:24.788Z · LW · GW

I have to respectfully disagree with your position. Kant's point, and the point of similar people who make the sweeping universalizations that you dislike, is that it is only in such idealized circumstances that we can make rational decisions. What makes a decision good or bad is whether it would be the decision rational people would endorse in a perfect society.

The trouble is not moving from our flawed world to an ideal world. The trouble is taking the lesson we've learned from considering the ideal world and applying it to the flawed world. Kant's program is widely considered to be a failure because it fails to provide real guidelines for the real world.

Basically, my point is that asking the Rawlsian "Would you prefer to live in a society where people do X" is valid. However, one may answer that question with "yes" and still rationally refrain from doing X. So your general point, that local and concrete decisions rule the day, still stands. Personally, though, I try to approach local and concrete decisions the way that Rawls does.