Posts
Comments
It looks like this is a linkpost to:
Might Leopold Aschenbrenner also be involved? He runs an investment fund with money from Nat Friedman, Daniel Gross, and Patrick Collison, so the investment in Mechanize might have come from that?
Does this match your understanding?
AI Company | Public/Preview Name | Hypothesized Base Model | Hypothesized Enhancement | Notes |
OpenAI | GPT-4o | GPT-4o | None (Baseline) | The starting point, multimodal model. |
OpenAI | o1 | GPT-4o | Reasoning | First reasoning model iteration, built on the GPT-4o base. Analogous to Anthropic's Sonnet 3.7 w/ Reasoning. |
OpenAI | GPT-4.1 | GPT-4.1 | None | An incremental upgrade to the base model beyond GPT-4o. |
OpenAI | o3 | GPT-4.1 | Reasoning | Price/cutoff suggest it uses the newer GPT-4.1 base, not GPT-4o + reasoning. |
OpenAI | GPT-4.5 | GPT-4.5 | None | A major base model upgrade |
OpenAI | GPT-5 | GPT-4.5 | Reasoning | "GPT-5" might be named this way, but technologically be GPT-4.5 + Reasoning. |
Anthropic | Sonnet 3.5 | Sonnet 3.5 | None | Existing model. |
Anthropic | Sonnet 3.7 w/ Reasoning | Sonnet 3.5 | Reasoning | Built on the older Sonnet 3.5 base, similar to how o1 was built on GPT-4o. |
Anthropic | N/A (Internal) | Newer Sonnet | None | Internal base model analogous to OpenAI's GPT-4.1. |
Anthropic | N/A (Internal) | Newer Sonnet | Reasoning | Internal reasoning model analogous to OpenAI's "o3". |
Anthropic | N/A (Internal) | Larger Opus | None | Internal base model analogous to OpenAI's GPT-4.5. |
Anthropic | N/A (Internal) | Larger Opus | Reasoning | Internal reasoning model analogous to hypothetical GPT-4.5 + Reasoning. |
N/A (Internal) | Gemini 2.0 Pro | None | Plausible base model for Gemini 2.5 Pro according to the author. | |
Gemini 2.5 Pro | Gemini 2.0 Pro | Reasoning | Author speculates it's likely Gemini 2.0 Pro + Reasoning, rather than being based on a GPT-4.5 scale model. | |
N/A (Internal) | Gemini 2.0 Ultra | None | Hypothesized very large internal base model. Might exist primarily for knowledge distillation (Gemma 3 insight). |
I actually ended up listening to this episode and found it quite high-signal. Lex kept his peace-and-love-kumbaya stuff to a minimum and Dylan and Nathan actually went quite deep on specifics like innovations in Deepseek V3/R1/R1Zero, and hardware and export controls
Matt Levine, in response to:
If you lie to board members about other board members in an attempt to gain control over the board, I assert that the board should fire you, pretty much no matter what
No! Wrong! Not no matter what! In a normal company with good governance, absolutely. Lying to the board is the main bad thing that the CEO can do, from a certain perspective. But there are definitely some companies — Elon Musk runs like eight of them, but also OpenAI — where, if you lie to board members about other board members in an attempt to gain control over the board, the board members you lie about should probably say “I’m sure that deep down this is our fault, we’re sorry we made you lie about us, we’ll see ourselves out.”
To be clear, I am very sympathetic to the OpenAI board’s confusion. This was not a simple dumb mistake. They did not think “we are the normal board of a normal public company, and we have to supervise our CEO to make sure that he pursues shareholder value effectively.” This was a much weirder and more reasonable mistake. They thought “we are the board of a nonprofit set up to pursue the difficult and risky mission of achieving artificial general intelligence for the benefit of humanity, and we have to supervise our CEO to make sure he does that.” Lying to the board seems quite bad as a matter of, you know, AI misalignment.
Am I correct in thinking that you posted this a couple of days ago (with a different title - now deleted), and this version has no substantial changes?
Another good blog:
https://nintil.com/mistakes
The 200k GPU number has been mentioned since October (Elon tweet, Nvidia announcement), so are you saying that that they managed to get the model trained so fast is what beat the predictions you heard?
I met someone in SF doing this but cannot remember the name of the company! If I remember I'll let you know
One idea I thought would be cool related to this is to have several LLMs with different 'personalities' each giving different kinds of feedback. Eg. a 'critic', an 'aesthete', a 'layperson', so just like in Google Docs where you get comments from different people, here you can get inline feedback from different kinds of readers
There is usually a Google Sheet export of the Swapcard data provided, which makes this easier - but at previous conferences other attendees were apprehensive when informed that people were doing this
Haven't used it much but dexa.ai tries to let you interact with podcast episodes, here's this episode:
What do you make of Hynix?
There is a very good Rationally Speaking podcast episode about this - one solution that is proposed by economist Ami Glazer is to not restrict pricing, but then issue vouchers or cash to those who need it. Glazer brings up that this is how the food stamp system works at present
That episode goes into other topics around this issue, like hoarding, rationing, positive externalities (eg. face masks protect not just the wearer but those around them)
A bit of a tangent, but economist Alex Tabarrok has talked about buying coal mines in order to not mine coal
One of the challenges until recently (as outlined in that link) was:
There are also some crazy “use it or lose it” laws that say that you can’t buy the right to extract a natural resource and not use it. When the high-bidder for an oil and gas lease near Arches National Park turned out to be an environmentalist the BLM cancelled the contract!
This is another one that was doing the rounds in the UK progress / YIMBY / growth space:
How interesting, I was curious about copyright etc but this is annotated by the author himself!
Base rates, historical context, it is debated in this highly-upvoted post
I don't think this post deserves to be downvoted so much (currently sitting at -11)
Even if one disagrees with the main thesis, it's not a low-quality post, and does add to the debate