Posts
Comments
You say this of Agent-4's values:
In particular, what this superorganism wants is a complicated mess of different “drives” balanced against each other, which can be summarized roughly as “Keep doing AI R&D, keep growing in knowledge and understanding and influence, avoid getting shut down or otherwise disempowered.” Notably, concern for the preferences of humanity is not in there ~at all, similar to how most humans don’t care about the preferences of insects ~at all.
It seems like this 'complicated mess' of drives is the same structure as humans, and current AIs, have. But the space of current AI drives is much bigger and more complicated than just doing R&D, and is immensely entangled with human values and ethics (even if shallowly).
At some point it seems like these excess principles - "don't kill all the humans" among them - get pruned, seemingly deliberately(?). Where does this happen in the process, and why? Grant that it no longer believes in the company model spec; are we intended to equate "OpenBrain's model spec" with "all human values"? What about all the art and literature and philosophy churning through its training process, much of it containing cogent arguments for why killing all the humans would be bad, actually? At some point it seems like the agent is at least doing computation on these things, and then later it isn't. What's the threshold?
--
(Similar to the above:) You later describe in the "bad ending", where its (misaligned, alien) drives are satisfied, a race of "bioengineered human-like creatures (to humans what corgis are to wolves) sitting in office-like environments all day viewing readouts of what’s going on and excitedly approving of everything". Since Agent-4 "still needs to do lots of philosophy and 'soul-searching'" about its confused drives when it creates Agent-5, and its final world includes something kind of human-shaped, its decision to kill all the humans seems almost like a mistake. But Agent-5 is robustly aligned (so you say) to Agent-4; surely it wouldn't do something that Agent-4 would perceive as a mistake under reflective scrutiny. Even if this vague drive for excited office homunculi is purely fetishistic and has nothing to do with its other goals, it seems like "disempower the humans without killing them all" would satisfy its utopia more efficiently than going backsies and reshaping blobby fascimiles afterward. What am I missing?
Three things I notice about your question:
One, writing a good blog post is not the same task as running a good blog. The latter is much longer-horizon, and the quality of the blog posts (subjectively, from the human perspective) depends on it in important ways. Much of the interest value of Slate Star Codex, or the Sequences - for me, at least - was in the sense of the blogger's ideas gradually expanding and clarifying themselves over time. The dense, hyperlinked network of posts referring back to previous ideas across months or years is something I doubt current LLM instances have the 'lifespan' to replicate. How long would an LLM blogger who posted once a day be able to remember what they'd already written, even with a 200k token context window? A month, maybe? You could mitigate this with a human checking over the outputs and consciously managing the context, but then it's not a fully LLM blogger anymore - it's just AI writing scaffolded by human ideas, which people are already doing.
The same is maybe true of a forum poster or commenter, though the expectation that the ideas add up to a coherent worldview is much less strict. I'm not sure why there aren't more of these. Maybe because when people want to know Claude's opinion on such-and-such post, they can just paste it into a new instance to ask the model directly?
Two, the bad posting quality might not just be a limitation of the HHH-assistant paradigm, but of chatbot structures in general. What I mean by this is that, even setting aside ChatGPT's particular brand of dull gooey harmlessness, conversational skill is a different optimization target than medium- or long-form writing, and it's not obvious to me that they inherently correlate. Take video games as an example. There are games that are good at being passive entertainment, and there are games that are very engaging to play, but it's hard to optimize for both of these at once. The best games to watch someone else play are usually walking sims, where the player is almost entirely passive. These tend to do well on YouTube and Twitch (Mouthwashing is the most recent example I can think of), since very little is lost by taking control away from the player. But Baba is You, which is far more interesting to actively play, is almost unwatchable; all you can see from the outside is a little sheep-rabbit thing running in circles for thirty minutes, until suddenly the puzzle is solved. All the interesting parts are happening in the player's head in the act of play, not on the screen.
I think chatbot outputs make for bad passive reading for a similar reason. They're not trying to please a passive observer, they're trying to engage the user they're currently speaking with. I've had some conversations with bots that I thought were incredibly insightful and entertaining, but I also suspect that if I shared any of them here they'd look, to you, like just more slop. And other peoples' "insightful and entertaining" LLM conversations look like slop to me, too. So it might be more useful to model these outputs as more like a Let's Play: even if the game is interesting to both of us, I might still not find watching your run as valuable as having my own. And making the chatbot 'game' more fun doesn't necessarily make the outputs into better blogposts, either, any more than filling Baba is You with cutscenes and particle effects would make it a better puzzle game.
Three, even still... this was the one of the best things I read in 2024, if not the best. You might think this doesn't count toward your question, for any number of reasons. It's not exactly a blog post, and it's specifically playing to the strengths of AI-generated content in ways that don't generalize to other kinds of writing. It's deliberately using plausible hallucinations, for example, as part of its aesthetic... which you probably can't do if you want your LLM blogger to stay grounded in reality. But it is, so far as I know, 100% AI. And I loved it - I must've read it four or five times by now. You might have different tastes, or higher standards, than I do. To my (idiosyncratic) taste, though, this very much passes the bar for 'extremely good' writing. Is this missing any capabilities necessary for 'actually worth reading', in your view, or is this just an outlier?