Posts
Comments
they put substantial probability on the trend being superexponential
I think that's too speculative.
I also think that around 25-50% of the questions are impossible or mislabeled.
I wouldn't be surprised if 3-5% of questions were mislabeled or impossible to answer, but 25-50%? You're basically saying that HLE is worthless. I'm curious why. I mean, I don't know much about the people who had to sift through all of the submissions, but I'd be surprised if they failed that badly. Plus, there was a "bug bounty" aimed at improving the quality of the dataset.
TBC, my median to superhuman coder is more like 2031.
Guess I'm a pessimist then, mine is more like 2034.
I don't have one knock-down counterargument why the timelines will be much longer, so here's a whole lot of convincing-but-not-super-convincing counterarguments:
- This contradicts METR timelines, which, IMO, is the best piece of info we currently have for predicting when AGI will arrive.
- Microsoft is not going to fund Stargate. At best, it means Stargate will be delayed. At worst, it means it will be axed. For the timelines in this post to be accurate, Stargate would have to be mostly finished right now, today. Even if OpenAI could literally print money, large data centers take 2-6 years to build.
- "Moreover, it [Agent-1] could offer substantial help to terrorists designing bioweapons, thanks to its PhD-level knowledge of every field and ability to browse the web."
"Agent-1, a new model under internal development, it’s good at many things but great at helping with AI research."
Considering that frontier LLMs of today can solve at most 20% of problems on Humanity's Last Exam, both of these predictions appear overly optimistic to me. And HLE isn't even about autonomous research, it's about "closed-ended, verifiable questions". Even if some LLM scored >90% on HLE by late 2025 (I bet this won't happen), that wouldn't automatically imply that it's good at open-ended problems with no known answer. Present-day LLMs have so little agency that it's not even worth talking about.
EDIT: for the record, I expect 45-50% accuracy on HLE by the end of 2025, 70-75% by the end of 2026, and 85-90% accuracy by the end of 2027. - A memory module that can be stored externally (on a hard drive) is handwaved as something that Just Works™, I don't expect it to be so easy.
- As of today, there is no robust anti-hallucination/error correction mechanism for LLMs. It seems like another thing that is handwaved as something that Just Works™: just beat the neural net with the RLHF stick until the outputs look about right.
- Imagine writing a similar piece for videogames in 2016, after AlphaGo became news. If someone wrote that by 2018-2019 all mainstream videogames would use neural nets trained using RL, they would be laughably wrong. Hell, if someone wrote that but replaced 2018-2019 with 2024-2025, they would still be wrong.
This is the least convincing argument of all of these, it's just my way of saying, "I don't feel like reality actually, really works this way". On that nore, I also expect that recursive self-improvement requires a completely new architecture that not only doesn't look like Transformers but doesn't even look like a neural network.
Not a criticism, but I think you overlooked a very interesting possibility: developing a near-perfect speech-to-text transcription AI and transcribing the entire YouTube. The biggest issue with training multi-modal models is acquiring the right ("paired") training data. If YouTube had 99.9% accurate subtitles for every video, this would no longer be a problem.
You might be interested in reading about aspiration adaptation theory: https://www.sciencedirect.com/science/article/abs/pii/S0022249697912050
To me the most appealing part of it is that goals are incomparable and multiple goals can be pursued at the same time without the need for a function that aggregates them and assigns a single value to a combination of goals.
I'm quite late (the post was made 4 years ago), and I'm also new to LessWrong, so it's entirely possible that other, more experienced members, will find flaws in my argument.
That being said, I have a very simple, short and straightforward explanation of why rationalists aren't winning.
Domain-specific knowledge is king.
That's it.
If you are a programmer and your code keeps throwing errors at you, then no matter how many logical fallacies and cognitive biases you can identify and name, posting your code on stackoverflow is going to provide orders of magnitude more benefit.
If you are an entrepreneur and you're trying to start your new business, then no matter how many hours you spend assessing your priors and calibrating your beliefs, it's not going to help you nearly as much as being able to tell a good manager apart from a bad manager.
I'm not saying that learning rationality isn't going to help at all, rather I'm saying that the impact of learning rationality on your chances of success will be many times smaller than the impact of learning domain-specific knowledge.
Ok, thank you for the clarification!
I'm very new to Less Wrong in general, and to Eliezer's writing in particular, so I have a newbie question.
any more than you've ever argued that "we have to take AGI risk seriously even if there's only a tiny chance of it" or similar crazy things that other people hallucinate you arguing.
just like how people who helpfully try to defend MIRI by saying "Well, but even if there's a tiny chance..." are not thereby making their epistemic sins into mine.
I've read AGI Ruin: A List of Lethalities, and I legitimately have no idea what is wrong with "we have to take AGI risk seriously even if there's only a tiny chance of it". What is wrong with it? If anything, this seems like something I would say if I had to explain the gist of AGI Ruin: A List of Lethalities to someone else very briefly and using very few words.
The fact that I have absolutely no clue what is wrong with it probably means that I'm still very far from understanding anything about AGI and Eliezer's position.