readerm

Posts
Comments

Posts

Large Language Models can Strategically Deceive their Users when Put Under Pressure. 2023-11-15T16:36:04.446Z

Comments

Comment by ReaderM on Frontier AI Models Still Fail at Basic Physical Tasks: A Manufacturing Case Study · 2025-04-17T04:23:13.980Z · LW · GW

How about o4-mini-high ? Supposedly, it's actually better than o3 at visual reasoning. I'm not expecting much better. Just Curious

Comment by ReaderM on Modern Transformers are AGI, and Human-Level · 2024-03-31T00:18:15.385Z · LW · GW

Not really. The majority of your experiences and interactions are forgotten and discarded, the few that aren't are recalled and triggered by the right input when necessary and not just sitting there in your awareness at all times. Those memories are also modified at every recall.

And that's really just beside the point. However you want to spin it, evaluating that many positions is not necessary for backtracking or playing chess. If that's the base of your "impossible" rhetoric then it's a poor one.

Comment by ReaderM on Modern Transformers are AGI, and Human-Level · 2024-03-30T19:25:51.979Z · LW · GW

You can call it a "gut claim" if that makes you feel better. But the actual reason is I did some very simple math (about the window size required and given quadratic scaling for transformers) and concluded that practically speaking it was impossible.

If you're talking about this:

Now imagine trying to implement a serious backtracking algorithm. Stockfish checks millions of positions per turn of play. The attention window for your "backtracking transformer" is going to have to be at lease {size of chess board state}*{number of positions evaluated}.

And because of quadratic attention, training it is going to take on the order of {number or parameters}*({chess board state size}*{number of positions evaluated})^2

then that's just irrelevant. You don't need to evaluate millions of positions to backtrack (unless you think humans don't backtrack) or play chess.

My point was not that "a relatively simple architecture that contains a Transformer as the core" cannot solve problems via trial and error (in fact I think it's likely such an architecture exists). My point was that transformers alone cannot do so.

There's nothing the former can do that the latter can't. "architecture" is really overselling it but i couldn't think of a better word. It's just function calling.

Comment by ReaderM on Modern Transformers are AGI, and Human-Level · 2024-03-30T02:07:07.816Z · LW · GW

Have you never figured out something by yourself? The way I learned to do Sudoku was: I was given a book of Sudoku puzzles and told "have fun".

So few shot + scratchpad ?

I didn't say it was impossible to train an LLM to play Chess. I said it was impossible for an LLM to teach itself to play a game of similar difficulty to chess if that game is not in it's training data.

More gut claims.

What they do not do is teach themselves things that aren't in their training data via trial-and-error. Which is the primary way humans learn things

Setting up the architecture that would allow a pretrained LLM to trial and error whatever you want is relatively trivial. Current state of the art isn't that competent but the backbone for this sort of work is there. Sudoku, Game of 24 solve rate is much higher with Tree of thought for instance. There's stuff for Minecraft too.

Comment by ReaderM on Modern Transformers are AGI, and Human-Level · 2024-03-30T00:01:40.212Z · LW · GW

sure. 4000 words (~8000 tokens) to do a 9-state 9-turn game with the entire strategy written out by a human.

Ok? That's how you teach anybody anything.

Now extrapolate that to chess, go, or any serious game.

LLMs can play chess, poker just fine. gpt 3.5-turbo-instruct plays at about 1800 Elo, consistently making legal moves. - https://github.com/adamkarvonen/chess_gpt_eval

Then there is this grandmaster level chess transformer - https://arxiv.org/abs/2402.04494

Poker - https://arxiv.org/abs/2308.12466

And this doesn't address at all my actual point, which is that Transformers cannot teach themselves to play a game.

Oh so you wrote/can provide a paper proving this or..?

This is kind of the problem with a lot of these discussions. Wild Confidence on ability estimation from what is ultimately just gut feeling. You said GPT-4 couldn't play tic-tac-toe. Well it can. You said it would be impossible to train a chess playing model this century. Already done.

Now you're saying Transformers can't "teach themselves to play a game". There is 0 theoretical justification for that stance.

Comment by ReaderM on Modern Transformers are AGI, and Human-Level · 2024-03-29T03:13:29.767Z · LW · GW

GPT-4 can play tic-tac-toe

https://chat.openai.com/share/75758e5e-d228-420f-9138-7bff47f2e12d

Comment by ReaderM on On Board Vision, Hollow Words, and the End of the World · 2023-12-11T01:15:02.546Z · LW · GW

Not sure what you mean by 100 percent accuracy and of course, you probably already know this but 3.5 Instruct Turbo plays chess at about 1800 ELO fulfilling your constraints (and has about 5 illegal moves (potentially less) in 8205) https://github.com/adamkarvonen/chess_gpt_eval

Comment by ReaderM on The idea that ChatGPT is simply “predicting” the next word is, at best, misleading · 2023-12-10T02:42:43.727Z · LW · GW

They can compute a state prior to each generated token and they can choose a token that signal a preservation of this state.

Comment by ReaderM on Large Language Models can Strategically Deceive their Users when Put Under Pressure. · 2023-11-17T07:45:57.487Z · LW · GW

They had access to and tested the base un-RLHF'd model. Doesn't change much. RLHF has slightly higher misalignment and deception rates(which is a bit notable) but otherwise similar behavior.

Comment by ReaderM on AI Timelines · 2023-11-15T06:58:54.694Z · LW · GW

Optimal tic tac toe takes explaining the game in excruciating detail. https://chat.openai.com/share/75758e5e-d228-420f-9138-7bff47f2e12d

Comment by ReaderM on AI Timelines · 2023-11-15T06:57:39.420Z · LW · GW

Optimal play requires explaining the game in detail. See here

https://chat.openai.com/share/75758e5e-d228-420f-9138-7bff47f2e12d

Comment by ReaderM on Are language models good at making predictions? · 2023-11-06T19:57:52.540Z · LW · GW

https://imgur.com/a/3gYel9r

https://openai.com/research/gpt-4

Comment by ReaderM on Alignment Implications of LLM Successes: a Debate in One Act · 2023-10-27T17:01:50.375Z · LW · GW

I don't understand your position. Are you saying that if we generated protein sequences by uniformly randomly independently picking letters from "ILVFMCAGPTSYWQNHEDKR" to sample strings, and then trained an LLM to predict those uniform random strings, it would end up with internal structure representing how biology works? Because that's obviously wrong to me and I don't see why you'd believe it.

Ah no. I misunderstood you here. You're right.

**What I was trying to get at the notion that something in particular (Human, evolution etc) has to have "figured something out". The only requirement is that the Universe has "figured it out". i.e it is possible. More on this later down.

The algorithm that uses a Fourier transform for modular multiplication is really simple. It is probably the most straightforward way to solve the problem with the tools available to the network, and it is strongly related to our best known algorithms for multiplication.

Simple is relative. It's a good solution. I never said it wasn't. It's not however the simplest solution. The point here is that models don't optimize for simple. They don't care about that. They optimize for what works. If a simple function works then great. If it stops working then the model shifts from that just as readily as it picked a "simple" one in the first place. There is also no rule that a simple function would be less or more representative of the casual processes informing the real outcome than a more complicated one.

If a perfect predictor straight out of training is using a simple function for any task, it is because it worked for all the data it'd ever seen not because it was simple.

My claim is that for our richest data, the causal processes that inform the data is human intelligence.

**1. You underestimate just how much of the internet contains text whose prediction outright requires superhuman capabilities, like figuring out hashes, or predicting the results of scientific experiments, generating the result of many iterations of refinement, or predicting stock prices/movement. The Universe has figured it out. And that is enough. A perfect predictor of the internet would be a superintelligence, it won’t ‘max out’ anywhere near human just because humans wrote it down.

Essentially, for the predictor, the upper bound of what can be learned from a dataset is not the most capable trajectory, but the conditional structure of the universe implicated by their sum.

A predictor modelling all human text isn't modelling humans but the universe. Text is written by the universe and humans are just the bits that touch the keyboard.

A Human is a general intelligence. Humans are a Super Intelligence. A single machine that is at the level of the greatest human intelligence for every domain is a Super Intelligence even if there's no notable synergy or interpolation.

But the chances of synergy or interpolation between domains is extremely high.

Much the same way you might expect new insights to arise from a human who has read and mastered the total sum of human knowledge.

A relatively mundane example of this that has already happened is the fact that you can converse in Catalan with GPT-4 on topics no Human Catalan speaker knows.

Game datasets aren't the only example of outcome that is preceded by text describing it. Quite a lot of text is fashioned in this way actually.

Plausibly true, but don't our best game-playing AIs also do self-play to create new game-playing information instead of purely relying on other's games? Like AlphaStar.

All our best game playing AI are not predictors. They are either deep reinforcement learning (which as with anything can be modelled by a predictor https://arxiv.org/abs/2106.01345 https://arxiv.org/abs/2205.14953 https://sites.google.com/view/multi-game-transformers) or some variation of search or some of both. The only information that can be provided to them are the rules and the state of the board/game so they are bound by the data in a way a predictor isn't necessarily.

Self play is a fine option. What I'm disputing is the necessity of it when prediction is on the table.

Comment by ReaderM on Alignment Implications of LLM Successes: a Debate in One Act · 2023-10-26T18:07:24.415Z · LW · GW

>They find functions that fit the results. Most such functions are simple and therefore generalize well. But that doesn't mean they generalize arbitrarily well.

You have no idea how simple the functions they are learning are.

>Not really any different from the human language LLM, it's just trained on stuff evolution has figured out rather than stuff humans have figured out. This wouldn't work if you used random protein sequences instead of evolved ones.

It would work just fine. The model would predict random arbitrary sequences and the structure would still be there.

>They try to predict the results. This leads to predicting the computation that led to the results, because the computation is well-approximated by a simple function and they are also likely to pick a simple function.

Models don't care about "simple". They care about what works. Simple is arbitrary and has no real meaning. There are many examples of interpretability research revealing convoluted functions.

https://www.alignmentforum.org/posts/N6WM6hs7RQMKDhYjB/a-mechanistic-interpretability-analysis-of-grokking

>Inverting relationships like this is a pretty good use-case for language models. But here you're still relying on having an evolutionary ecology to give you lots of examples of proteins.

So ? The point is that they're limited by the data and the casual processes that informed it, not the intelligence or knowledge of humans providing the data. Models like this can and often do eclipse human ability.

If you train a predictor on text describing the outcome of games as well as games then a good enough predictor should be able to eclipse the output of even the best match in training by modulating the text describing the outcome.

Comment by ReaderM on Alignment Implications of LLM Successes: a Debate in One Act · 2023-10-26T16:49:32.444Z · LW · GW

Large language models gain their capabilities from self-supervised learning on humans performing activities, or from reinforcement learning from human feedback about how to achieve things, or from internalizing its human-approved knowledge into its motivation. In all of these cases, you rely on humans figuring out how to do stuff, in order to make the AI able to do stuff, so it is of course logical that this would tightly integrated capabilities and alignment in the way Simplicia says.

No. Language Models aren't relying on humans figuring anything out. How could they ? They only see results not processes.

You can train a Language Model on protein sequences. Just the sequences alone, nothing else and see it represent biological structure and function in the inner layers. No one taught them this. It was learnt from the data.

https://www.pnas.org/doi/full/10.1073/pnas.2016239118

The point here is that Language Models see results and try to predict the computation that led to those results. This is not imitation. It's a crucial difference because it means you aren't bound by the knowledge of the people supplying this data.

You can take this protein language model. You can train on described function and sequences and you can have a language model that can take supplied use cases and generate novel functional protein sequences to match.

https://www.nature.com/articles/s41587-022-01618-2

Have humans figured this out ? Can we go function to protein just like that ? No way! Not even close

Comment by ReaderM on Alignment Implications of LLM Successes: a Debate in One Act · 2023-10-25T08:54:04.944Z · LW · GW

User info

Posts

Comments