Posts

Comments

Comment by MrCheeze on Research Notes: Running Claude 3.7, Gemini 2.5 Pro, and o3 on Pokémon Red · 2025-04-22T19:03:01.991Z · LW · GW

Text adventures do seem like a good eval right now, since they're the ONLY games that can be tested without either relying on vision (which is still very bad), or writing a custom harness for each game (in which case your results depend heavily on the harness).

Comment by MrCheeze on Is Gemini now better than Claude at Pokémon? · 2025-04-21T20:54:25.516Z · LW · GW

(Gemini did actually write much of the Gemini_Plays_Pokemon scaffolding, but only in the sense of doing what David told it to do, not designing and testing it.)

I think you're probably right that a LLM coding its own scaffolding is probably more achievable than one playing the game like a human, but I don't think current models can do it - watching the streams, the models don't seem like they understand their own flaws, although admittedly they haven't been prompted to focus on this.

Comment by MrCheeze on Is Gemini now better than Claude at Pokémon? · 2025-04-21T13:05:27.742Z · LW · GW

On the other hand, Claude has (arguably) a better pathfinding tool. As long as it requests to be moved to a valid set of coordinates from the screenshot overlay grid, the tool will move it there. Gemini mostly navigates on its own, although it has access to another instance of Gemini dedicated just to pathfinding.

I very much argue this. Claude's navigator tool can only navigate to coordinates that are onscreen, meaning that the main model needs to have some idea of where it's going. Which means grappling with problems that are extremely difficult for both models, such as "go AROUND the wall instead of right through it".

In contrast, the Gemini pathfinder tool can travel to a coordinate halfway across the map, totally bypassing that problem. (Yes, the pathfinder is technically another instance of Gemini, but it's been prompted with exactly what algorithm to follow, so this is not a major handicap.) When returning to a previously visited map - Gemini is banned from using the pathfinder tool to enter unexplored tiles - it can probably traverse even mazes that take the Claude scaffolding all day, in just one or two turns.

Of course this has further advantages for maintaining coherence, since if you spend all day on a maze, you forget what your plan even was after you get to the end of it.

Comment by MrCheeze on Research Notes: Running Claude 3.7, Gemini 2.5 Pro, and o3 on Pokémon Red · 2025-04-21T12:36:58.151Z · LW · GW

I have not tested if Gemini can distinguish this tree (and intend to eventually). This may very well be the only reason Gemini has progressed further.

You missed an important fact about the Gemini stream, which is that it just reads the presence of these trees from RAM and labels them for the model (along with a few other special tiles like ledges and water). Nevertheless I do think Gemini's vision is better, by which I mean if you provide it a screenshot it will sometimes identify the correct tree, unlike Claude who will never do so. (Although to my knowledge the Gemini in the stream has literally never used vision for anything.) And in general the Gemini streamer is far more liberal about updating the scaffolding to address challenges than the Claude streamer is.

Also there's one other reason that Gemini has gotten farther: it simply has the whole walkthrough of the game memorized, while Claude doesn't know what to do after the thunderbadge. (I don't think either model would be remotely competent on RPGs that aren't in the training data.)

This doesn't mean memory is not a problem. The problems are just more subtle than one might imagine. For instance, the lack of direct memory means models lack a real sense of time, or how hard a task is. That means even when given a notepad to record observations, they will not consistently record "HOW TO SOLVE THAT PUZZLE THAT TOOK FOREVER" because they don't realize it took forever. And of course if it's not written down it falls completely out of "long-term" memory.

This has been a recurring problem with the Claude stream, where the model is given the ability to take notes. Whenever he's struggling and failing to solve a problem for a long time, he'll endlessly write notes about his (wrong) ideas for what to do, reinforcing that behaviour. When he finally tries the right thing, it seems like it was easy, so you MIGHT get one note written down about it. If you're lucky.

In general, however incompetent this post makes it sound like the models are at playing the game, they're even worse than that. I feel like this is in large part because of LLMs having frozen weights - every single mistake that they make will be repeated every time the situation reoccurs, instead of just once as a human would do. Taking notes doesn't help this very much, as their basic instincts being wrong seems to make far more difference than what's in their notes.

Comment by MrCheeze on So how well is Claude playing Pokémon? · 2025-03-08T14:36:51.518Z · LW · GW

And now in the second run it has entered a similar delusional loop. It knows the way to Cerulean City is via Route 4, but the route before and after Mt. Moon are both considered part of Route 4. Therefore it deluded itself into thinking it can get to Cerulean from the first part of the route. Because of that, every time it accidentally stumbles into Mt Moon and is making substantial progress towards the exit, it intentionally blacks out to get teleported back outside the entrance, so it can look for the nonexistent path forwards.

From what I've seen on stream, the chances of it questioning and breaking from this delusion are basically zero. There's still the possibility of progress by getting lost in Mt Moon and stumbling into the exit, but it will never actually figure out what it was doing wrong here.

People in the stream chat and subreddit have been discussing this paper suggesting that LLM agents often get into these "meltdown" loops that they aren't able to recover from: https://www.reddit.com/r/ClaudePlaysPokemon/comments/1j65jqf/vendingbench_a_benchmark_for_longterm_coherence

Also, the stream admin seemed to think the same thing, saying during the first run that "some runs just are cursed" and setting up a poll for whether to reset the game.

Comment by MrCheeze on So how well is Claude playing Pokémon? · 2025-03-08T14:19:10.536Z · LW · GW

Note that the creator stated that the setup is intentionally somewhat underengineered:

I do not claim this is the world's most incredible agent harness; in fact, I explicitly have tried not to "hyper engineer" this to be like the best chance that exists to beat Pokemon. I think it'd be trivial to build a better computer program to beat Pokemon with Claude in the loop.

This is like meant to be some combination of like "understand what Claude's good at and Benchmark and understand Claude-alongside-a-simple-agent-harness", so what that boils down to is this is like a pretty straightforward tool-using agent.

Comment by MrCheeze on So how well is Claude playing Pokémon? · 2025-03-07T12:27:45.171Z · LW · GW

This basically sums up how it's doing: https://www.reddit.com/r/ClaudePlaysPokemon/comments/1j568ck/the_mount_moon_experience

Of course much of that is basic capability issues -poor spatial reasoning, short term memory that doesn't come anywhere close to lasting for 1 lap, etc.

But I've also noticed ways in which Claude's personality is sabotaging it. Claude is capable of taking notes saying that it "THOROUGHLY confirmed NO passages" through the eastern barrier - but never gets impatient or frustrated, so this doesn't actually prevent it from trying the same thing every time it sees the eastern wall again.

And it general, it seems to have a strong bias towards visiting places that are mentioned frequently in its notes - even though that's the exact opposite of what you should be doing for exploration. I've seen it reach the uncommonly reached second ladder on the floor, and then promptly decided it needs to run back to the first ladder (which it has seen hundreds of times) to see whether the first ladder goes anywhere.

And it should definitely be mentioned that run #1 was mercy killed when its knowledge base was populated almost entirely with falsehoods both about how far it had progressed in the game and how to get further, leading to a singleminded obsession with exploring the southern wall of Cerulean City forever.

Comment by MrCheeze on Why I'm doing PauseAI · 2024-05-02T01:07:06.451Z · LW · GW

"Under development" and "currently training" I interpret as having significantly different meanings.

Comment by MrCheeze on The ‘ petertodd’ phenomenon · 2023-04-15T17:49:44.624Z · LW · GW

Doesn't strike me as inevitable at all, just a result of OpenAI following similar methods for creating their tokenizer twice. (In both cases, leading to a few long strings being included as tokens even though they don't actually appear frequently in large corpuses.)

They presumably had already made the GPT-4 tokenizer long before SolidGoldMagikarp was discovered in the GPT-2/GPT-3 one.

Comment by MrCheeze on The ‘ petertodd’ phenomenon · 2023-04-15T12:46:52.055Z · LW · GW

Prior to OpenAI's 2023-02-14 patching of ChatGPT (which seemingly prevents it from directly encountering glitch tokens like ‘ petertodd’)

I've never seen it mentioned around here, but since that update, ChatGPT is using a different tokenizer that has glitch tokens of its own:

https://github.com/openai/tiktoken/blob/46287bfa493f8ccca4d927386d7ea9cc20487525/tiktoken/model.py#L16

https://wetdry.world/@MrCheeze/110130795421274483
 

Comment by MrCheeze on Rationality Quotes September 2012 · 2013-01-26T03:10:32.520Z · LW · GW

I'd say this captures the spirit of Less Wrong perfectly.

Comment by MrCheeze on Unbounded Scales, Huge Jury Awards, & Futurism · 2012-10-28T00:48:19.245Z · LW · GW

500 years still sounds optimistic to me.

Comment by MrCheeze on AI timeline predictions: are we getting better? · 2012-09-10T02:46:25.829Z · LW · GW

The key is in the phrase "much more complicated". The sort of algorithm that could become a mind would be an enormous leap forward in comparison to anything that has ever been done so far.

Comment by MrCheeze on AI timeline predictions: are we getting better? · 2012-08-30T18:46:54.927Z · LW · GW

Man, people's estimations seem REALLY early. The idea of AI in fifty years seems almost absurd to me.

Comment by MrCheeze on Math is Subjunctively Objective · 2012-08-12T02:31:23.472Z · LW · GW

I still stand by my belief that 2 + 3 = 5 does not in fact exist, and yet it is still true that adding two things with three things will always result in five things.

Comment by MrCheeze on How to Seem (and Be) Deep · 2012-06-04T00:54:08.450Z · LW · GW

"I think that if you took someone who was immortal, and asked them if they wanted to die for benefit X, they would say no."

This doesn't help against arguments that stable immortality is impossible or incredibly unlikely, of course, but I suppose those aren't the arguments you were countering at the time.

Comment by MrCheeze on Pascal's Mugging: Tiny Probabilities of Vast Utilities · 2012-04-23T22:33:33.449Z · LW · GW

Yes, but the chance of magic powers from outside the matrix is low enough that what he says has an insignificant difference.

...or is an insignificant difference even possible?

Comment by MrCheeze on SIAI - An Examination · 2011-10-11T00:31:00.914Z · LW · GW

Hm. I'd rather have seen more of the analysis on whether what they do with the money is useful, but this is something.

Comment by MrCheeze on "Stuck In The Middle With Bruce" · 2011-10-09T23:22:45.304Z · LW · GW

Hmm, didn't really get anything out of this. Maybe you need to be able to be competent at stuff in the first place to sabotage yourself?

Comment by MrCheeze on Prisoner's Dilemma Tournament Results · 2011-09-08T00:52:07.386Z · LW · GW

If this ever happens again I'd make one for the long-term evolutionary one that tries to learn strategies U-style, and then remembers what it learned in future rounds. If that's allowed.

Comment by MrCheeze on On the unpopularity of cryonics: life sucks, but at least then you die · 2011-08-31T04:21:36.107Z · LW · GW

Shouldn't priority be given to improving quality of lives first?

Comment by MrCheeze on An Outside View on Less Wrong's Advice · 2011-07-12T14:49:24.699Z · LW · GW

It makes me sad because it means smart people aren't doing things that are actually useful.

Comment by MrCheeze on Decoherence is Simple · 2011-03-16T15:19:41.915Z · LW · GW

This isn't quite what your post was about, but one thing I've never understood is how anyone could possibly find "the universe is totally random" to be a MORE simple explanation.

Comment by MrCheeze on To Spread Science, Keep It Secret · 2011-02-07T21:20:12.114Z · LW · GW

Well this explains a lot.

Comment by MrCheeze on Counterfactual Calculation and Observational Knowledge · 2011-02-07T20:58:28.701Z · LW · GW

The thing is, the other world was chosen specifically BECAUSE it had the opposite answer, not randomly like the world you're in.

Comment by MrCheeze on Why Truth? · 2011-01-31T03:49:20.658Z · LW · GW

Maybe in ninety-eight universes out of 100 it does blow up and we just see the one that's left; and he's actually giving an accurate number. :P

Comment by MrCheeze on Not Taking Over the World · 2011-01-31T03:35:08.913Z · LW · GW

"Give it to you" is a pretty lame answer but I'm at least able to recognise the fact that I'm not even close to being a good choice for having it.

That's more or less completely ignoring the question but the only answers I could ever come up with at the moment are what I think you call cached thoughts here.

Comment by MrCheeze on 31 Laws of Fun · 2011-01-31T03:08:21.889Z · LW · GW

Well it's good to see that if you somehow found a way to implement your ideas you would at least do it well.

Comment by MrCheeze on You Only Live Twice · 2011-01-23T23:43:59.553Z · LW · GW

retracted

Comment by MrCheeze on The Moral Void · 2010-12-12T18:31:26.644Z · LW · GW

So... the correct answer is to dissolve the question, yes?

Comment by MrCheeze on The Hero With A Thousand Chances · 2010-12-10T21:54:32.837Z · LW · GW

I had trouble reading it too. If you really don't want to do it like that, then at least just take out all the quotes except for at the very beginning and end of the speech (no quotes at all between paragraphs).

Comment by MrCheeze on Two Cult Koans · 2010-12-02T21:27:51.570Z · LW · GW

Okay, I have no idea whatsoever what this is supposed to be saying.

EDIT: Wait, hold on. Is it supposed to not make sense?

Comment by MrCheeze on The Sword of Good · 2010-12-02T21:21:48.874Z · LW · GW

Loved the story and also the first time I took you strong atheism completely seriously, but I think that one bit where they stab those three sleeping guys went a bit too strongly to the "no, this definately isn't right" side of things. Although I didn't think about that scene at all when I was trying to figure out which side was the Good side and thought about the death of Alek as my main piece of evidence for the Lord of Dark being Bad possibility so that's something.