Open Source LLM Pokémon Scaffold
post by Julian Bradshaw · 2025-04-27T00:57:56.379Z · LW · GW · 0 commentsThis is a link post for https://github.com/cicero225/llm_pokemon_scaffold
Contents
No comments
This is a cleaned-up, open-source version of the LLM Pokémon Scaffold described in Research Notes: Running Claude 3.7, Gemini 2.5 Pro, and o3 on Pokémon Red [LW · GW]. (forked from David Hershey of Anthropic's scaffold here, all development on top of that was done by my friend, not me)
Since that post, a number of changes have been made to the scaffold. The major ones are:
- Instead of using colored squares on the game screenshots, information is printed as text, ex. "Impassable", "Explored", "Check Here"
- Models are seemingly helped by putting relevant information blatantly in the spot they need to see it, rather than indirectly via a legend or instructions or whatever
For some reason it helps if you write "CHECK HERE" on every unexplored tile.
- Automatically-updating ASCII collision map given to LLM
- Generated by code
- Uses numbers indicating how many moves away each tile is
Behold, Pewter City.
- Improved prompts for "Critique Claude"/"Guide Gemini"/"Oversight o3"
- Prompt 1: Given a bunch of facts about the current game state and instructions on what is trustworthy and what's not, make a summary
- this is an attempt to get the model to grasp reality better, telling it what sources of information it should basically always trust (data from game's RAM), mostly trust (its own knowledge of the game from training), not trust (map labels it made itself), and mostly distrust (its own vision)
- Prompt 2: Look at output from prompt 1 and try to remove inconsistencies
- Prompt 3: OK now talk to the model you're critiquing
- Prompt 1: Given a bunch of facts about the current game state and instructions on what is trustworthy and what's not, make a summary
- Models encouraged to use a "mark_checkpoint" tool to maintain a running list of major checkpoints (Left House, Beat Misty, died to Brock, etc.)
- "detailed_navigation" tool which, if called, calls an alternate model that basically rolls around trying to explore + DFS but isn't told what the goal is (but is told to talk to NPCs and exit maps)
- Autopathing tool that can travel to known coordinates on the map
All of this helps somewhat but doesn't make LLMs amazing at Pokémon by any means.
0 comments
Comments sorted by top scores.