Steering subsystems: capabilities, agency, and alignment 2023-09-29T13:45:00.739Z
AGI isn't just a technology 2023-09-01T14:35:57.062Z
Internal independent review for language model agent alignment 2023-07-07T06:54:11.552Z
Simpler explanations of AGI risk 2023-05-14T01:29:29.289Z
A simple presentation of AI risk arguments 2023-04-26T02:19:19.164Z
Capabilities and alignment of LLM cognitive architectures 2023-04-18T16:29:29.792Z
Agentized LLMs will change the alignment landscape 2023-04-09T02:29:07.797Z
AI scares and changing public beliefs 2023-04-06T18:51:12.831Z
The alignment stability problem 2023-03-26T02:10:13.044Z
Human preferences as RL critic values - implications for alignment 2023-03-14T22:10:32.823Z
Clippy, the friendly paperclipper 2023-03-02T00:02:55.749Z
Are you stably aligned? 2023-02-24T22:08:23.098Z


Comment by Seth Herd on My Current Thoughts on the AI Strategic Landscape · 2023-09-29T14:46:39.255Z · LW · GW

I agree that there's a heavy self-selection bias for those working in safety or AGI labs. So I'd say both of these factors are large, and how to balance them is unclear.

I agree that you can't use the Wright Brothers as a reference class, because you don't know in advance who's going to succeed.

I do want to draw a distinction between AI researchers, who think about improving narrow ML systems, and AGI researchers. There are people who spend much more time thinking about how breakthroughs to next-level abilities might be achieved, and what a fully agentic, human-level AGI would be like. The line is fuzzy, but I'd say these two ends of a spectrum exist. I'd say the AGI researchers are more like the society for aerial locomotion. I assume that society had a much better prediction than the class of engineers who'd rarely thought about integrating their favorite technologies (sailmaking, bicycle design, internal combustion engine design) into flying machines.

Comment by Seth Herd on My Arrogant Plan for Alignment · 2023-09-28T22:31:06.743Z · LW · GW

The part where you debate the merits of this ridiculously arrogant plan and decide against it.

The premise that no one else cares is factually false, so reading farther seemed like a waste of time. You said at the top it would be fun for you to write it, so I assumed that was the main purpose, and it wasn't even claiming to be worth my time.

Comment by Seth Herd on My Arrogant Plan for Alignment · 2023-09-28T22:06:22.838Z · LW · GW

I think you really want to edit in some sort of pitch for reading this, right at the top. The premises are both silly and kind of irritating, so I was barely motivated enough to scroll to the bottom to see if you had a point that anyone would care about. You did. So put it up front. Time and attention are limited resources.

Comment by Seth Herd on The point of a game is not to win, and you shouldn't even pretend that it is · 2023-09-28T22:01:58.650Z · LW · GW

That's what the whole post was about. You don't seem to be engaging with it, just contradicting it without addressing any of the arguments.

Comment by Seth Herd on The point of a game is not to win, and you shouldn't even pretend that it is · 2023-09-28T21:48:35.060Z · LW · GW

Learning is fun, but it's not the only thing that's fun. Anyway, when you pitch your peacewagers game, I strongly suggest you talk about fun ;)

Comment by Seth Herd on The point of a game is not to win, and you shouldn't even pretend that it is · 2023-09-28T20:11:59.355Z · LW · GW

It's foolish to accept a final goal someone else gives you, let alone a piece of paper in a box. If you're not thinking about why you want to win, you're being foolish. I'm sure Sirlin goes into why winning is a good goal, but you haven't given us any clues here.

Comment by Seth Herd on The point of a game is not to win, and you shouldn't even pretend that it is · 2023-09-28T20:09:49.808Z · LW · GW

I think you're missing an important point of games: fun.

I think that's the biggest reason people play games.

All of the things you've mentioned can be fun. And sometimes people play games to understand each other better, or to learn. But they largely do it because those things are fun, too.

We're wired to like fun because it causes us to practice, and to win. But we genuinely like fun.

(winning is usually a lot of fun; the problem is that others have to lose, so less total fun is produced in a one-winner game).

That's mostly why people play games, and you've got to take that into account in your game design.

Comment by Seth Herd on Peacewagers so Far · 2023-09-28T19:50:45.831Z · LW · GW

Good points. Then I think you want to market to / pursued parents. They want their children to be good at resisting bad deals, but wise in how they exercise their power over others.

I think for adults there's also a huge pitch for this being a new type of game: cooperative and competitive at the same time. Play it your way. (And learn how that works out).

You might actually want to make it a tiny bit more competitive to entice people who think cooperative games are dull. It would be easy to put the points on the same scale by looking at average scores of different roles over just a few sessions. There still wouldn't be one winner, but it would make it more of a challenge to improve your own play and play better than others. Challenge is motivating.

Comment by Seth Herd on My Current Thoughts on the AI Strategic Landscape · 2023-09-28T19:28:28.355Z · LW · GW

Thanks for contributing your views. I think it's really important for us to understand others' views on these topics, as this helps us have sensible conversations, faster.

Most of your conclusions are premised on AGI being a difficult project from where we are now. I think this is the majority view outside of alignment circles and AGI labs (which are different from AI labs).

My main point is that our estimate of AGI difficulty should include very short timelines. We don't know how hard AGI might be, but we also have never known how easy it might be.

After a couple of decades studying the human brain and mind, I'm afraid we're quite close to AGI. It looks to me like the people who think most about how to build AGI tend to think it's easier than those who don't. This seems important. The most accurate prediction of heavier-than-air flight would've come from the Wright brothers (and I believe their estimate was far longer than it actually took them). As we get closer to it, I personally think I can see the route there, and that exactly zero breakthroughs are necessary. I could easily be wrong, but it seems like expertise in how minds work probably counts somewhat in making that estimate.

I think there's an intuition that what goes on in our heads must be magical and amazing, because we're unique. Thinking hard about what's required to get from AI to us makes it seem less magical and amazing. Higher cognition operates on the same principles as lower cognition. And consciousness is quite beside the point (it's a fascinating topic; I think what we know about brain function explains it rather well, but I'm resisting getting sidetracked by that because it's almost completely irrelevant for alignment).

I'm always amazed by people saying "well sure, current AI is at human intelligence in most areas, and has progressed quickly, but it will take forever to do that last magical bit".

I recognize that you have a wide confidence interval and take AGI seriously even if you currently think it's far away and not guaranteed to be important.

I just question why you seem even modestly confident of that prediction.

Again, thanks for the post! You make many excellent points. I think all of these have been addressed elsewhere, and fascinating discussions exist, mostly on LW, of most of those points.

Comment by Seth Herd on How to Become a 1000 Year Old Vampire · 2023-09-28T19:12:06.504Z · LW · GW

This is all premised on the assumption that getting more things done, learning, and acquiring skills are the point of life.

This is very likely false.

So there's another step of analyzing how much accomplishing all of this stuff actually helps you or anyone else.

Comment by Seth Herd on Peacewagers so Far · 2023-09-28T18:32:54.011Z · LW · GW

Marvelous! I loved board games when I was young. Then I stopped liking them when I decided they didn't usually foster interesting conversations. I think you've identified why that is. I've encountered just a couple of games with a cooperative/competitive dynamic.

One is The Bean Game, Bohnanza. It does have the most points win rule, but we played so everyone was interested in how they placed. We also relaxed the restrictions on card trading, at first by not remembering that rule. A remarkable thing emerged over sessions: the most generous and kindest player tended to win. People wanted to be generous back to them. They did more trades, so had better card sets.

Your game here seems marvelous. I particularly love the idea that the players aren't balanced, so you can't compare scores. This makes game design vastly easier and allows more creativity in tinkering with abilities, etc.

I want a copy!

I also think this could be commercially successful; a polished version sounds like it would be aesthetically pleasing, and the tag line of "teach your kids to negotiate in potentially adversarial situations" would be highly compelling for parents.

The pitch you've given is highly compelling to me as a rationalist, so I think you'd sell some to the rationalist community if you could get it even decently manufactured.

Comment by Seth Herd on Is Bjorn Lomborg roughly right about climate change policy? · 2023-09-28T03:19:37.392Z · LW · GW

It seems like that analysis doesn't include large-scale crop failures, which is one of the worst anticipated downsides of climate change. It also doesn't account for the non-death loss of welfare involved in suffering through heatwaves and other extreme weather events, and worrying about the next one. That's not to argue against this whole line of analysis, just to note that it's incomplete.

Comment by Seth Herd on “X distracts from Y” as a thinly-disguised fight over group status / politics · 2023-09-25T21:03:05.297Z · LW · GW

I think the claim is that they're not competing for public attention any more than AInotkilleveryoneism is competing with, say, saving the whales. Intuitively that doesn't sound right. When people think about AI, they'll think of either one or the other a bit more, so there's competition for attention there. But attention to a certain topic isn't a fixed quantity. If AInotkilleveryoneism worked with the current AI harms crowd, we might collectively get more than the sum of public attention we're getting. Or maybe not, for all I know. I'd love it if we had some more media/marketing/PR people helping with this project.

Comment by Seth Herd on “X distracts from Y” as a thinly-disguised fight over group status / politics · 2023-09-25T18:46:12.983Z · LW · GW

Agreed on all points. I want to expand on two issues:

The first I think you agree with: it would be unfortunate if someone read this and thought "yeah, the immediate AI harms crowd is either insincere and Machiavellian at our expense, or just stupid. That's so irritating. I'm gonna go dunk on them". I think that would make matters worse. It will indirectly increase people saying "x-risk distracts from other AI concerns". Because a nontrivial factor here is that they're motivated by and expressing irritation at the x-risk faction (whether that's justified or not is beside this point). Us getting irritated at them will make them more irritated with us in a vicious cycle, and violá, we've got two camps that could be allies, spending their energy undercutting each others' efforts.

You address that point by saying we shouldn't be making the inverse silly argument that immediate harms distract from x-risk. I'd expand it to say that we shouldn't be making any questionable arguments that antagonize other groups. We would probably enhance our odds of survival by actively be making allies, and avoiding making enemies by irritating people unnecessarily.

The second addition is that I think the "x-risk distracts from..." argument is usually a sincere belief. I'm not sure if you'd agree with this or not. The framing here could sound like this is a shrewd and deceptive planned strategy from the immediate harms crowd. It might be occasionally, but I know a number of people who are well-intentioned (and surprisingly well-informed) who really believe that x-risk concerns are silly and talking about them distracts from more pressing concerns. I think they're totally wrong, but I don't think they're bad or idiotic people.

I believe in never attributing to malice that which could be attributed to emotionally motivated confirmation bias in evaluating complex evidence.

Comment by Seth Herd on Would You Work Harder In The Least Convenient Possible World? · 2023-09-23T13:43:38.420Z · LW · GW

No human being is a full utilitarian. Expecting them or yourself to be will bring disappointment or guilt.

But helping others can bring great joy and satisfaction.

The answer is obviously yes to Alice's question.

We should work harder in the most convenient world. The premise basically states that Bob would be happier AND do more good. He's an idiot for saying no, except to get bossy, controlling Alice off his back and not let her gaslight him into doing what she wants.

But is this that world? Probably not the least convenient/easiest. Where is it on the spectrum? What will lead to Bob's happiest life? That is the right question for Bob to ask, and it's not trivial to answer.

Comment by Seth Herd on Would You Work Harder In The Least Convenient Possible World? · 2023-09-23T13:36:01.069Z · LW · GW

Sure, but this isn't about Alice. She's not telling Bob or us to talk like her, just asking if he'd work harder.

Comment by Seth Herd on Would You Work Harder In The Least Convenient Possible World? · 2023-09-23T13:33:12.066Z · LW · GW

This does seem more like EA than LW.

Comment by Seth Herd on Would You Work Harder In The Least Convenient Possible World? · 2023-09-23T13:30:53.202Z · LW · GW

Yes, but this isn't about Alice.

Comment by Seth Herd on Let's talk about Impostor syndrome in AI safety · 2023-09-22T15:25:55.886Z · LW · GW

I think imposter syndrome is quite common in all fields.

Comment by Seth Herd on Memory bandwidth constraints imply economies of scale in AI inference · 2023-09-22T15:21:23.123Z · LW · GW

That memory would be used for what might be called semantic indexing. So it's not that I can remember tons of info, it's that I remember it in exactly the right situation.

I have no idea if that's an accurate figure. You've got the synapse count and a few bits per synapse ( or maybe more), but you've also got to account for the choices of which cells synapse on which other cells, which is also wired and learned exquisite specifically, and so constitutes information storage of some sort.

Comment by Seth Herd on The AI Explosion Might Never Happen · 2023-09-21T05:33:29.792Z · LW · GW

Sorry! I'm responding because you did say feedback was appeciated. Did you do a search before posting? Like with most topic-dedicated discussion spots, this topic has a lot of existing discussion. I stopped reading after two paragraphs with no mention of previous posts or articles or anything on the topic. The topic of speed of self-improvekebt is super important, but if you haven't read anything, the odds of you having important new thoughts seem low.

It's a long post and there's a lot of important stuff to read.

Comment by Seth Herd on A Theory of Laughter—Follow-Up · 2023-09-14T19:14:32.626Z · LW · GW

I think this all makes sense. Just a couple of thoughts to add, which I think are consistent with what you've said:

Laughter from tickling seems like an incentive to play-fight. I think humor is an evolutionariy outgrowth of that to incentivize intellectual sparring. Like physical play-fighting, this sharpens skills.  Both can also demonstrate skills to improve one's position in a hierarchy, or desirability as a mate. There are theories about humor as a mating display; I'd like to call it stotting or pronking. Those technically are about signaling fitness to predators, not competitors or mates, but maybe we could broaden the term for the sake of humor?

From this perspective, humor is a serious business.

The danger/safety juxtaposition or switch might actually be created by the moment of confusion in parsing a joke.

I have some memories of being an adolescent, not getting the joke, and feeling distinctly in danger. Was I the butt of the joke? Or was I proving I was in the out-group by not getting it?

Even outside of social situations, confusion could be evolutionarily dangerous. Am I lost? Did I misinterpret the clues about where I could find food?

The click as you get the joke brings you back to safety. 

As a side note, this perspective implies that humor doesn't need to extend accidentally from the same principles that create laughter from play-fighting, since it's been actively adapted to serve a purpose from that starting point.

Comment by Seth Herd on A Bat and Ball made me Sad · 2023-09-14T18:37:49.424Z · LW · GW

Does it matter a lot? People are screwing up decisions that don't matter so they can focus on what they actually care about. This has terrible consequences when we take a sum of opinions on complex issues, so I guess that matters.

Comment by Seth Herd on One Minute Every Moment · 2023-09-14T00:02:24.979Z · LW · GW

Vastly more work has been done since then, including refined definitions of working memory. It measures what he thought he was measuring, so it is following his intent. But it's still a bit of a chaotic shitshow, and modern techniques are unclear on what they're measuring and don't quite match their stated definitions, too.

Comment by Seth Herd on A Bat and Ball made me Sad · 2023-09-13T21:48:25.180Z · LW · GW

It's quite rational to ignore what people are telling you to do, and do what's good for you instead. People on MTurk usually do a bunch of it, and they optimize for payout, not for humoring the experimenter or doing a good job.

I think many of them are not paying attention.

There are probably checks on speed of response to weed out obvious gaming. So the smart thing would be to do something else at the same time, and occasionally enter an answer.

I did cognitive psychology and cog neurosci for a couple of decades, and never ceased to be amazed at the claims of "humans can't do this" vs. "these people didn't bother to do this because I didn't bother to motivate them properly".

This isn't to say humans are smart. We're not nearly as smart as we'd like to think. But we're not this dumb, either.

Comment by Seth Herd on A Case for AI Safety via Law · 2023-09-11T21:17:51.637Z · LW · GW

I think you'll find this topic discussed a lot, both pro and con, under the term "regulation".

Comment by Seth Herd on A Bat and Ball made me Sad · 2023-09-11T19:06:53.050Z · LW · GW

Humans are dumb, but not this dumb. This is about people not bothering to answer questions since they're rushing. See other comments.

Comment by Seth Herd on Recreating the caring drive · 2023-09-07T20:26:54.231Z · LW · GW

You should look at the work of Steve Byrnes, particularly his Intro to Brain-Like-AGI Safety sequence. Figuring out the brain mechanisms of prosocial behavior (including what you're terming the caring drive) is his primary research goal. I've also written about this approach in an article, Anthropomorphic reasoning about neuromorphic AGI safety, and in some of my posts here.

Comment by Seth Herd on Rational Agents Cooperate in the Prisoner's Dilemma · 2023-09-02T23:45:14.290Z · LW · GW

If you agree with everything he said, then you don't think rational agents cooperate on this dilemma in any plausible real-world scenario, right? Even superintelligent agents aren't going to have full and certain knowledge of each other.

Comment by Seth Herd on AGI isn't just a technology · 2023-09-02T19:44:38.913Z · LW · GW

The idea that agentiness is an advantage does not predict that there will never be an improvement made in other ways.

It predicts that we'll add agentiness to those improvements. We are busy doing that. It will prove advantageous to some degree we don't know yet, maybe huge, maybe so tiny it's essentially not used. But that's only in the very near term. The same arguments will keep on applying forever, if they're correct.

WRT your comment that we don't have a handle on values or drives, I think that's flat wrong. We have good models in humans and AI. My post Human preferences as RL critic values - implications for alignment lays out the human side and one model for AI. But providing goals in natural language for a language model agent is another easy route to adding a functional analogue of values.

I will continue for now to focus my alignment efforts on futures where AGI is agentic, because those seem like the dangerous ones, and I have yet to hear any plausible future in which we thoroughly stick to tool AI and don't agentize it at some point.

Edit: Thinking about this a little more, I do see one plausible future in which we don't agentize tool AI: one with a "pivotal act" that makes creating it impossible, probably involving powerful tool AI. In that future, the key bit is human motivations, which I think of as the societal alignment problem. That needs to be addressed to get alignment solutions implemented, so these two futures are addressed by the same work.

Comment by Seth Herd on AGI isn't just a technology · 2023-09-02T19:20:38.059Z · LW · GW

Yes. But the whole point of the alignment effort is to look into the future, rather than have us run it over because we weren't certain what would happen and so didn't bother to make any plans for different things that would happen.

Comment by Seth Herd on One Minute Every Moment · 2023-09-02T07:02:59.660Z · LW · GW

That task measures what can be written to memory within 5 minutes, given unlimited time to write relevant compression codes into long-term semantic memory. It's complex. See my top-level comment.

Comment by Seth Herd on One Minute Every Moment · 2023-09-02T06:58:56.787Z · LW · GW

I'm not sure what the takeaway is here, but these calculations are highly suspect. What a memory athlete can memorize (in their domain of expertise) in 5 minutes is an intricate mix of working memory and long-term semantic memory, and episodic (hippocampal) memory.

This is a very deep topic. Reading comprehension researchers have estimated the size of working memory as "unlimited", but that's obviously specific to their methods of measurement.

Modern debates on working memory capacity are 1-4 items. 7 was specific to what is now known as the phonological loop, which is subvocally reciting numbers. The strong learned connections between auditory cortex and verbal motor areas gives this a slight advantage over working memory for material that hasn't been specifically practiced a lot.

See the concept of exformation, incidentally from one of the best books I've found on consciousness. Bits of information encoded by a signal to a sophisticated system is intricately intermixed with that system's prior learning. It's a type of compression. Not making a call at a specific time can encode a specific signal of unlimited length, if sender and receiver agree to that meaning.

Sorry for the lack of citations. I've had my head pretty deeply into this stuff in the past, but I never saw the importance of getting a precise working memory capacity estimate. The brain mechanisms are somewhat more interesting to me, but for different reasons than estimating capacity (they're linked to goals and reward system operation, since working memory for goals and strategy is probably how we direct behavior in the short term). 

Comment by Seth Herd on AGI isn't just a technology · 2023-09-02T06:39:01.711Z · LW · GW

So are you saying that you don't think we'll build agentic AI any time soonish? I'd love to hear your reasoning on that, because I'd rest easier if I felt the same way.

I agree that LLMs are marvelously non-agentic and intelligent. For the reasons I mentioned, I expect that to change, sooner or later, and probably sooner. Someone invented a marvelous new tool, and I haven't heard a particular reason to not expect this one to become an agent given even a little bit of time and human effort. The argument isn't that it happens instantly or automatically. AutoGPT and similar failing on the first quick public try doesn't seem like a good reason to expect similar language model agents to fail for a long time. I do think it's possible they won't work, but people will give it a more serious try than we've seen publicly so far. And if this approach doesn't hit AGI, the next one will experience similar pressures to be made into an agent.

As for models that make good predictions, that would be nice, but we do probably need to get predictions about agentic, self-aware and potentially self-improving agents right on the first few tries. It's always a judgment call on when the predictions are in the relevant domain. I think maintaining a broad window of uncertainty makes sense.

Comment by Seth Herd on Adumbrations on AGI from an outsider · 2023-09-01T23:14:35.729Z · LW · GW

This is highly useful. Thank you so much for taking the time to write it!

It's not worth debating the points you raise, since the point is you explaining to us where the explanation went wrong for you. That didn't stop many people from doing it, of course :)

I agree strongly with your points about the communication style. It's not possible to address every objection in a short piece, but it is possible to put forth the basic argument in clear and simple terms. I think the type of person who's interested in AI safety typically isn't focused on communicating with laypeople. And we need to get better.

Comment by Seth Herd on AGI isn't just a technology · 2023-09-01T22:46:11.587Z · LW · GW

Yes, I think "tools" is an even more obvious red flag that this person isn't thinking about an agentic, self-aware system.

Comment by Seth Herd on The Economics of the Asteroid Deflection Problem (Dominant Assurance Contracts) · 2023-08-31T18:41:15.326Z · LW · GW

This is a interesting idea, but your example oversells it substantially. Future iterations need to not do this, or you'll sound like a huckster and not get support.

If all ten people on the street have to contribute, everyone knows for sure that they won't get what they want if they don't contribute. No payback is necessary; free riding is impossible. That's the easy case. When you just need some amount of money rather than every single individual to contribute, it's quite possible to free ride.

But why can't I just let other people pay and free-ride? 

You haven't been paying attention. Unless I've priced this contract wrong, if you don't pay it doesn't happen.

This is egregiously wrong. You can't possibly know that this contract will fail if each individual reader of this sentence doesn't contribute. This is the distinction from the example you've used.

So I agree with the other comments that this does not fix the freerider problem. At all. Which is the big problem, not the frictional costs of pledging.

I think you've got to fix that terrible logic in future versions, or you'll sound dishonest.

I still pledged fifty bucks, because improving crowdsourcing like Kickstarter even marginally is a worthy goal! And the frictional cost is still a problem, so overcoming it will help.

Comment by Seth Herd on Perpetually Declining Population? · 2023-08-31T17:49:36.141Z · LW · GW

How will falling population change the high birth rate? I hadn't heard an argument that people aren't having kids because there are too many people. I'd heard that birth rates fall when women get to decide whether to have kids. I'd assumed that's because having kids is really hard, and that labor has just been dumped on women without their consent in the past.

If this is more or less true, it is subject to change in a wealthier society. I would've had kids if I could have been a stay-at-home dad, well-supported by one income. If both parents combined only had to work 20 hours or less, I think people would have a lot more kids. If there were more societal support for having kids (better schools and childcare), even more people would have more kids.

Comment by Seth Herd on AGI-Automated Interpretability is Suicide · 2023-08-30T19:01:53.369Z · LW · GW

Done, thanks!

Comment by Seth Herd on AGI-Automated Interpretability is Suicide · 2023-08-29T23:11:02.298Z · LW · GW

I haven't justified either of those statements; I hope to make the complete arguments in upcoming posts. For now I'll just say that human cognition is solving tough problems, and there's no good reason to think that algorithms would be lots more efficient than networks in solving those problems.

I'll also reference Morevec's Paradox as an intuition pump. Things that are hard for humans, like chess and arithmetic are easy for computers (algorithms); things that are easy for humans, like vision and walking, are hard for algorithms.

I definitely do not think it's pragmatically possible to fully interpret or reverse engineer neural networks. I think it's possible to do it adequately to create aligned AGI, but that's a much weaker criteria.

Comment by Seth Herd on Dear Self; we need to talk about ambition · 2023-08-29T23:01:05.366Z · LW · GW

The advice to self angle is great.

I also see a lot of career/life advice here that seems like it might be good for some and bad for others. I this results from variance in what we want and enjoy, and a blindness to it. Our culture (at least academic and tech culture) preaches ambition.

I frequently caution those in grad school against ambition. I've seen ambition take a high toll from too many bright young people, and seen them later declare that they're happier after deciding to focus more on quality-of-life than ambition.

This is an extremely deep topic. There are important questions about what our ultimate goals really are. I suspect most of us want love and respect from ourselves and those around us, and to enjoy what we do moment by moment. But opinions vary, even once they're well-thought-out. I think the remembering self is just a set of moments of the experiencing self, and shouldn't be privileged in making life-decisions. But again, opinions vary.

These questions don't answer the strategy question of how to best pursue your personal goals, but they do help frame it. I'm frequently astonished by how little time very smart people have spent on considering what they really want from life. Hopefully, rationalists do this more than the academics and startup types I'm more familiar with.

Comment by Seth Herd on The Evidence for Question Decomposition is Weak · 2023-08-29T03:24:36.986Z · LW · GW

I think this is highly confounded with effort. Asking people to decompose a forecast will, on average, cause them to think more. This further calls into question any positive findings for decomposition.

I find this baffling. It seems like breaking predictions into sub-parts should help. But I haven't thought about it much :)

One possible counter-factor is in structuring people's judgments artificially. If asking them to break a prediction into sub-parts makes them factor the problem in different ways than they would in their own thinking, I can see how that would hurt judgments.

And it could actually cost time. Asking sub-questions could cause people to spend their cognitive time on the particulars of those sub-problems, rather than spending that time on sub-problems they thought of themselves, and that work naturally with their overall strategy for making that prediction.

Comment by Seth Herd on Which possible AI systems are relatively safe? · 2023-08-28T18:33:43.499Z · LW · GW

Another factor for the safest type of AGI is one that can practically be built soon.

The perfect is the enemy of the good. A perfectly safe system that will be deployed five years after the first self-improving AGI is probably useless.

Of course the safest path is to never build an agentic AGI. But that seems unlikely.

This criteria is another argument for language model agents. I've outlined their list of safety advantages here.

Of course, we don't know if language model agents will achieve full AGI. 

Another path to AGI that seems both achievable and alignable is loosely brainlike AGI, along the lines of LeCun's proposed H-JEPA. Steve Byrnes' "plan for mediocre alignment" seems extensible to become quite a good plan for this type of AGI.  

Comment by Seth Herd on The Game of Dominance · 2023-08-27T22:53:25.338Z · LW · GW

I take LeCun to be more of a troll than a real thinker on the topic. His arguments are worth refuting only because his expertise will make others believe and echo them. Those refutations need to be concise to work in the public sphere, I think.

Comment by Seth Herd on EfficientZero: How It Works · 2023-08-27T17:41:14.372Z · LW · GW

This particular algorithm can't do long-term planning. It's reactive. It only plans about five steps out, which is very short in the videogame space and would be in robotics. And it only works in discrete spaces.

Both of those limitations have been addressed in related work but they haven't been integrated with this algorithm afaik.

But I agree that deep network robot control is going to work decently well, very soon.

Comment by Seth Herd on AGI-Automated Interpretability is Suicide · 2023-08-24T17:39:34.610Z · LW · GW

Any type of self-improvement in an un-aligned AGI = death. And if it's already better than human level, it might not even need to do a bit of self-improvement, just escape our control, and we're dead. So I think the suicide is quite a bit of hyperbole, or at least stated poorly relative to the rest of the conceptual landscape here.

If the AGI is aligned when it self-improves with algorithmic refinement, reflective stability should probably cause it to stay aligned after, and we just have a faster benevolent superintelligences.

So this concern is one more route to self-improvement. And theres a big question of how good a route it is.

My points were:

  1. learning is at least as important as runtime speed. Refining networks to algorithms helps with one but destroys the other
  2. Writing poems, and most cognitive activity, will very likely not resolve to a more efficient algorithm like arithmetic does. Arithmetic is a special case; perception and planning in varied environments require broad semantic connections. Networks excel at those. Algorithms do not.

So I take this to be a minor, not a major, concern for alignment, relative to others.

Comment by Seth Herd on AGI-Automated Interpretability is Suicide · 2023-08-21T19:40:46.364Z · LW · GW

Sorry it took me so long to get back to this; I either missed it or didn't have time to respond. I still don't, so I'll just summarize:

You're saying that what NNs do could be made a lot more efficient by distilling it into algorithms.

I think you're right about some cognitive functions but not others. That's enough to make your argument accurate, so I suggest you focus on that in future iterations. (Maybe going from suicide to adding danger would be more more accurate).

I suggest this change because I think you're wrong about a majority of cognition. The brain isn't being inefficient in most of what it does. You've chosen arithmetic as your example. I totally agree that the brain performs arithmetic in a wildly inefficient way. But that establishes one end of a spectrum. The intuition that most of cognition could be vastly optimized with algorithms is highly debetable. After a couple of decades of working with NNs and thinking about how they perform human cognition, I have the opposite intuition: NNs are quite efficient (this isn't to say that they couldn't be made more efficient - surely they can!).

For instance, I'm pretty sure that humans use a monte carlo tree search algorithm to solve novel problems and do planning. That core search strucure can be simplified as an algorithm.

But the power of our search process comes from having excellent estimates of the semantic linkages between the problem and possible leaves in the tree, and excellent predictors of likely reward for each branch. Those estimates are provided by large networks with good learning rules. Those can't be compressed into an algorithm particularly efficiently; neural network distillation would probably work about as efficiently as it's possible to work. There are large computational costs because it's a hard problem, not because the brain is approaching the problem in an inefficient way.

I'm not sure if that helps to convey my very different intuition or not. Like I said, I've got a limited time. I'm hoping to convey reaction to this post, in hopes it will clarify your future efforts. My reaction was "OK good point, but it's hardly "suicide" to provide just one more route to self-improvement". I think the crux is the intuition of how much of cognition can be made more efficient with an algorithm over a neural net. And I think most readers will share my intuition that it's a small subset of cognition that can be made much more efficient in algorithms.

One reason is the usefulness of learning. NNs provide a way to constantly and efficiently improve the computation through learning. Unless there's an equally efficient way to do that in closed form algorithms, they have a massive disadvantage in any area where more learning is likely to be useful. Here again, arithmetic is the exception that suggests a rule. Arithmetic is a closed cognitive function; we know exactly how it works and don't need to learn more. Ways of solving new, important problems benefit massively from new learning.

Comment by Seth Herd on Memetic Judo #2: Incorporal Switches and Levers Compendium · 2023-08-16T18:36:13.920Z · LW · GW

This is a fascinating argument, and it's shifting my perspective on plausible timelines to AGI risk.

I think you're absolutely right about current systems. But there are no guarantees for how long this is true. The amount of compute necessary to run a better-than-human AGI is hotly debated and highly debatable. (ASI isn't necessary for real threats).

This is probably still true for the next ten years, but I'm not sure it goes even that long. Algorithmic improvements have been doubling efficiency about every 18 months since the spread of network approaches; even if that doesn't keep up, they will continue, and Moore's law (or at least Kurzweil's law) will probably keep going almost as fast as it is.

That's on the order of magnitude of five doublings of compute, and five doublings of algorithmic efficiency (assuming some slowdown). That's a world with a thousand times more space-for-intelligence, and it seems plausible that a slightly-smarter-than-human AGI could steal enough to rent adequate compute, and hide successfully while still operating at adequate speed to outmaneuver the rest of the world.

How much intelligence is necessary to outsmart humanity? I'd put the lower bound at just above human intelligence. And I'd say that GPT-5, properly scaffolded to agency, might be adequate.

If algorithmic or compute improvement slow down, or if I'm wrong about how much intelligence is dangerous, we've got longer. And we've probably got a little longer, since those are pretty minimal thresholds.

Does that sound roughly right?

Comment by Seth Herd on George Hotz vs Eliezer Yudkowsky AI Safety Debate - link and brief discussion · 2023-08-16T17:54:42.085Z · LW · GW

Based on this summary, I think both of these guys are making weak and probably-wrong central arguments. Which is weird.

Yudkowsky thinks it's effectively impossible to align network-based AGI. He thinks this is as obvious as the impossibility of perpetual motion. This is far from obvious to me, or to all the people working on aligning those systems. If Yudkowsky thinks this is so obvious, why isn't he explaining it better? He's good at explaining things.

Yudkowsky's theory that it's easier to align algorithmic AGI seems counterintuitive to me, and at the very least, unproven. Algorithms aren't more interpretable than networks, and by similar logic, they're probably not easier to align. Specifying the arrangements of atoms that qualify as human flourishing is not obviously easier with algorithms than with networks.

This is a more complex argument, and largely irrelevant: we aren't likely to build algorithmic AGIs by the time we build network-based AGIs. This makes Eliezer despair, but I don't think his logic holds up on this particular point.

Hotz's claim that, if multiple unaligned ASIs can't coordinate, humans might play them off against each other, is similar. It could be true, but it's probably not going to happen. It seems like in that scenario, it's far more likely for one or some coalition of smarter ASIs to play the dumber humans against other ASIs successfully. Hoping that the worst player wins in a multipolar game seems like a forlorn hope.

Comment by Seth Herd on George Hotz vs Eliezer Yudkowsky AI Safety Debate - link and brief discussion · 2023-08-16T17:30:37.386Z · LW · GW

It seems like this only guarantees security along some particular vector. Which might be why current software doesn’t actually use this type of security.

And if you did actually close off software security as a threat model from ASI, wouldn’t it just choose a different, physical attack mode?