A decade of lurking, a month of postingpost by Max H (Maxc) · 2023-04-09T00:21:23.321Z · LW · GW · 4 comments
Overview of my recent posts Steering systems Gradual takeoff, fast failure Grinding slimes in the dungeon of AI alignment research Instantiating an agent with GPT-4 and text-davinci-003 Takeaways and observations Miscellaneous concluding points None 4 comments
This post is a look back on my first month or so as an active contributor on LessWrong, after lurking for over a decade. My experience so far has been overwhelmingly positive, and one purpose of this post is to encourage other lurkers to do the same.
The reason I decided to start posting, in a nutshell:
For the last 10 years or so, I've been following Eliezer's public writing and nodding along in silent agreement with just about everything he says.
I mostly didn't feel like I had much to contribute to the discussion, at least not enough to overcome the activation energy required to post, which for me seems to be pretty high.
However, over the last few years and especially the last few months, I've grown increasingly alarmed and disappointed by the number of highly-upvoted [LW · GW]and well-received [LW · GW] posts [? · GW] on AI, alignment, and the nature [LW · GW] of intelligent systems [LW · GW], which seem fundamentally confused about certain things. I think (what I perceive as) these misunderstandings and confusion are especially prominent in posts which reject all or part of the Yudkowskian view of intelligence and alignment.
I notice Eliezer's own views seem to be on the outs with some fraction of prominent posters these days. One hypothesis for this is that Eliezer is actually wrong about a lot of things, and that people are right to treat his ideas with skepticism.
Reading posts and comments from both Eliezer and his skeptics though, I find this hypothesis unconvincing. Eliezer may sometimes be wrong [LW · GW] about important things, but his critics don't seem to be making a very strong case.
(I realize the paragraphs above are potentially controversial. My intent is not to be inflammatory or to attack anyone. My goal in this post is simply to be direct about my own beliefs, without getting too much into the weeds about why I hold them.)
My first few posts and comments have been an attempt to articulate my own understanding of some concepts in AI and alignment which I perceive as widely misunderstood. My goal is to build a foundation from which to poke and prod at some of the Eliezer-skeptical ideas, to see if I have a knack for explaining where others have failed. Or, alternatively, to see if I am the one missing something fundamental, which becomes apparent through more active engagement.
Overview of my recent posts
This section is an overview of my posts so far, ranked by which ones I think are the most worth reading.
Most of my posts assume some background familiarity, if not agreement with, Yudkowskian ideas about AI and alignment. This makes them less accessible as "101 explanations", but allows me to wade a bit deeper into the weeds without getting bogged down in long introductions.
Steering systems [LW · GW]
My longest and most recent post, and the one that I am most proud of.
As of publishing this piece, it has gotten a handful of strong and weak upvotes, and zero downvotes. I'm not sure if this indicates it dropped off the front page before it could get more engagement, or if it was simply not interesting enough per-word for most people in its target audience to read to the end and vote on it.
The main intuition I wanted to convey in this post is how powerful systems might be constructed in the near future, by composing "non-agentic" foundation models in relatively simple ways. And further, that there are ways this leads to extreme danger / failure even before we get to the point of having to worry about even more powerful systems reflecting, deceiving, power-seeking, or exhibiting other more exotic examples of POUDA [LW · GW].
I'll highlight one quote from this piece, which I think is a nice distillation of a key insight for making accurate predictions about how the immediate future of LLMs is likely to play out:
Training GPT-4 was the work of hundreds of engineers and millions of dollars of computing resources by OpenAI. LangChain is maintained by a very small team. And a single developer can write a python script which glues together chains of OpenAI API calls into a graph. Most of the effort was in training the LLM, but most of the agency (and most of the useful work) comes from the relatively tiny bit of glue code that puts them all together at the end.
My first post, and the precursor for "Steering systems". Looking back, I don't think there's much here that's novel or interesting, but it's a briefer introduction to some of the ways I think about things in "Steering systems".
The post is about some ways I see potential for catastrophic failure before the failure modes that arise when dealing with the kinds of systems that MIRI and other hard-takeoff research groups tend to focus on. I think if we somehow make it past those failure modes though, we'll still end up facing the harder problems of hard takeoff.
This post attempts to articulate a metaphor for the different ways different kinds of alignment research might contribute to increasing or decreasing x-risk.
I still like this post, but looking back, I think I should have explained the metaphor in more detail, for people who aren't familiar with RPGs. Also, "grinding in the slime dungeons" might have been perceived as negative or dismissive of alignment research focused on current AI systems, which I didn't intend. I do think we are in the "early game" of AI systems and alignment, and slimes are a common early-game enemy in RPGs. That was the extent of the point I was trying to make with that part of the analogy.
This was mostly just my own fun attempt at experimenting with GPT-4 when I first got access. Others have done similar, more impressive things, but doing the experiment and writing the post gave me a better intuitive understanding of GPT-4's capabilities and the potential ways that LLMs can be arranged and composed into more complex systems. I think constructions like the one in this Twitter thread demonstrate the point I was trying to make in a more concrete and realistic way.
Takeaways and observations
- Writing is hard, writing well is harder. I have a much greater appreciation for prolific writers who manage to produce high quality, coherent, and insightful posts on a regular basis, whether I agree with their conclusions or not.
- Engaging and responding with critical and differing views is also hard. Whether someone responds to a particular commenter or even a highly-upvoted post with differing views seems like very little evidence about whether their own ideas are valid and correct, and more evidence about how much energy and time someone has to engage.
- The posts I'm most proud of are not the ones that got the most karma. Most of my karma comes from throwaway comments on popular linkposts, and my own most highly-upvoted submission is a podcast link.
I don't think this is a major problem - I'm not here to farm karma or maximize engagement, and my higher-effort posts and comments tend to have a smaller target audience.
More broadly, I don't think the flood of high-engagement but less technically deep posts on LW are crowding out more substantive posts (either my own or others) in a meaningful way. (Credit to the LW development team for building an excellent browsing UX.)
I do think there is a flood of more substantive posts that do crowd each other out, to some degree - I spend a fair amount of time reading and voting on more substantive new submissions, and still feel like there's a lot of good stuff that I'm missing due to time constraints.
- I encourage other longtime lurkers to consider becoming active. Even if you initially get low engagement or downvotes, as long as you understand and respect the norms [LW · GW] of the community, your participation will be welcome. My experience so far has been overwhelmingly positive, and I wish I had started sooner.
- The "Get feedback" feature exists and is great. I didn't use it for this post, but Justis from the LW moderation team gave me some great feedback on Steering systems [LW · GW], which I think made the post stronger.
Miscellaneous concluding points
- I welcome any feedback or engagement with my existing posts, even if it's not particularly constructive. Also welcome are any ideas for future posts or pieces to comment on, though I have many of my own ideas already.
- I realize that I made some controversial claims in the intro, and left them totally unsupported. Again, my intent is not to be inflammatory; the point here is just to stake out my own beliefs as concisely and clearly as possible.
Object-level discourse on these claims about AI alignment and differing viewpoints in the comments of this post is fine with me, though I might not engage with them immediately (or at all) if the volume is high, or even if it isn't.
- Despite my somewhat harsh words, I still think LW is the best place on the internet for rationality and sane discourse on AI and alignment (and many other topics), and no where else comes close.
- My real-life identity is not secret, though I'd prefer for now that my LW postings not be attached to my full name when people Google me for other reasons. PM me here or on Discord (m4xed#7691) if you want to know who I am. (Despite my past inclination to online lurking, I've been a longtime active participant in the meatspace rationality community in NYC. 👋)
Comments sorted by top scores.