Posts
Comments
Rather than make things worse as a means of compelling others to make things better, I would rather just make things better.
Brinksmanship and accelerationism (in the Marxist sense) are high variance strategies ill-suited to the stakes of this particular game.
[one way this makes things worse is stimulating additional investment on the frontier; another is attracting public attention to the wrong problem, which will mostly just generate action on solutions to that problem, and not to the problem we care most about. Importantly, the contingent of people-mostly-worried-about-jobs are not yet our allies, and it’s likely their regulatory priorities would not address our concerns, even though I share in some of those concerns.]
Ah, I think this just reads like you don't think of romantic relationships as having any value proposition beyond the sexual, other than those you listed (which are Things but not The Thing, where The Thing is some weird discursive milieu). Also the tone you used for describing the other Things is as though they are traps that convince one, incorrectly, to 'settle', rather than things that could actually plausibly outweigh sexual satisfaction.
Different people place different weight on sexual satisfaction (for a lot of different reasons, including age).
I'm mostly just trying to explain all the disagree votes. I think you'll get the most satisfying answer to your actual question by having a long chat with one of your asexual friends (as something like a control group, since the value of sex to them is always 0 anyway, so whatever their cause is for having romantic relationships is probably the kind of thing that you're looking for here).
I read your comment as conflating 'talking about the culture war at all' and 'agreeing with / invoking Curtis Yarvin', which also conflates 'criticizing Yarvin' with 'silencing discussion of the culture war'.
This reinforces a false binary between totally mind-killed wokists and people (like Yarvin) who just literally believe that some folks deserve to suffer, because it's their genetic destiny.
This kind of tribalism is exactly what fuels the culture war, and not what successfully sidesteps, diffuses, or rectifies it. NRx, like the Cathedral, is a mind-killing apparatus, and one can cautiously mine individual ideas presented by either side, on the basis of the merits of that particular idea, while understanding that there is, in fact, very little in the way of a coherent model underlying those claims. Or, to the extent that there is such a model, it doesn't survive (much) contact with reality.
[it feels useful for me to point out that Yarvin has ever said things I agree with, and that I'm sympathetic to some of the main-line wokist positions, to avoid the impression that I'm merely a wokist cosplaying centrism; in fact, the critiques of wokism I find most compelling are the critiques that come from the left, but it's also true that Yarvin has some views here that are more in contact with reality]
edit: I agree that people should say things they believe and be engaged with in good faith (conditional on they, themselves, are engaging in good faith)
I think you're saying something here but I'm going to factor it a bit to be sure.
- "not exactly hard-hitting"
- "not... at all novel"
- "not... even interesting"
- "not even criticisms of the humanities"
One and three I'm just going to call 'subjective' (and I think I would just agree with you if the Wikipedia article were actually representative of the contents of the book, which it is not).
Re 4: The book itself is actually largely about his experiences as a professor, being subjected to the forces of elite coordination and bureaucracy, and reads a lot like Yarvin's critiques of the Cathedral (although Fisher identifies these as representative of a pseudo-left).
Re 2: The novelty comes from the contemporaneity of the writing. Fisher is doing a very early-20th century Marxist thing of actually talking about one's experience of the world, and relating that back to broader trends, in plain language. The world has changed enough that the work has become tragically dated, and I personally wouldn't recommend it to people who aren't already somewhat sympathetic to his views, since its strength around the time of its publication (that contemporaneity) has, predictably, becomes its weakness.
The work that more does the thing testingthewaters is gesturing toward, imo, is Exiting the Vampire Castle. The views expressed in this work are directly upstream of his death: his firm (and early) rebuke of cancel culture and identity politics precipitated rejection and bullying from other leftists on twitter, deepening his depression. He later killed himself.
Important note if you actually read the essay: he's setting his aim at similar phenomena to Yarvin, but is identifying the cause differently // he is a leftist talking to other leftists, so is using terms like 'capital' in a valenced way. I think the utility of this work, for someone who is not part of the audience he is critiquing, is that it shows that the left has any answer at all to the phenomena Yarvin and Ngo are calling out; that they're not, wholesale, oblivious to these problems and, in fact, the principal divide in the contemporary left is between those who reject the Cathedral and those who seek to join it.
(obligatory "Nick Land was Mark Fisher's dissertation advisor.")
(I basically endorse Daniel and Habryka's comments, but wanted to expand the 'it's tricky' point about donation. Obviously, I don't know what they think, and they likely disagree on some of this stuff.)
There are a few direct-work projects that seem robustly good (METR, Redwood, some others) based on track record, but afaict they're not funding constrained.
Most incoming AI safety researchers are targeting working at the scaling labs, which doesn't feel especially counterfactual or robust against value drift, from my position. For this reason, I don't think prosaic AIS field-building should be a priority investment (and Open Phil is prioritizing this anyway, so marginal value per dollar is a good deal lower than it was a few years ago).
There are various governance things happening, but much of that work is pretty behind the scenes.
There are also comms efforts, but the community as a whole has just been spinning up capacity in this direction for ~a year, and hasn't really had any wild successes, beyond a few well-placed op-eds (and the juries out on if / which direction these moved the needle).
Comms is a devilishly difficult thing to do well, and many fledgling efforts I've encountered in this direction are not in the hands of folks whose strategic capacities I especially trust. I could talk at length about possible comms failure modes if anyone has questions.
I'm very excited about Palisade and Apollo, which are both, afaict, somewhat funding constrained in the sense that they have fewer people than they should, and the people currently working there are working for less money than they could get at another org, because they believe in the theory of change over other theories of change. I think they should be better supported than they are currently, on a raw dollars level (but this may change in the future, and I don't know how much money they need to receive in order for that to change).
I am not currently empowered to make a strong case for donating to MIRI using only publicly available information, but that should change by the end of this year, and the case to be made there may be quite strong. (I say this because you may click my profile and see I work at MIRI, and so it would seem a notable omission from my list if I didn't mention why it's omitted; reasons for donating to MIRI exist, but they're not public, and I wouldn't feel right trying to convince anyone of that, especially when I expect it to become pretty obvious later).
I don't know how much you know about AI safety and the associated ecosystem but, from my (somewhat pessimistic, non-central) perspective, many of the activities in the space are likely (or guaranteed, in some instances) to have the opposite of their stated intended impact. Many people will be happy to take your money and tell you it's doing good, but knowing that it is doing good by your own lights (as opposed to doing evil or, worse, doing nothing*) is the hard part. There is ~no consensus view here, and no single party that I would trust to make this call with my money without my personal oversight (which I would also aim to bolster through other means, in advance of making this kind of call).
*this was a joke. Don't Be Evil.
Preliminary thoughts from Ryan Greenblatt on this here.
[errant thought pointing a direction, low-confidence musing, likely retreading old ground]
There’s a disagreement that crops up in conversations about changing people’s minds. Sides are roughly:
- You should explain things by walking someone through your entire thought process, as it actually unfolded. Changing minds is best done by offering an account of how your own mind was changed.
- You should explain things by back-chaining the most viable (valid) argument, from your conclusions, with respect to your specific audience.
This first strategy invites framing your argument around the question “How did I come to change my mind?”, and this second invites framing your argument around the question “How might I change my audience’s mind?”. I am sometimes characterized as advocating for approach 2, and have never actually taken that to be my position.I think there’s a third approach here, which will look to advocates of approach 1 as if it were approach 2, and look to advocates of approach 2 as if it were approach 1. That is, you should frame the strategy around the question “How might my audience come to change their mind?”, and then not even try to change it yourself.
This third strategy is about giving people handles and mechanisms that empower them to update based on evidence they will encounter in the natural course of their lives, rather than trying to do all of the work upfront. Don’t frame your own position as some competing argument in the market place of ideas; hand your interlocutor a tool, tell them what they might expect, and let their experience confirm your predictions.I think this approach has a few major differences over the other two approaches, from the perspective of its impact:
- It requires much less authority. (Strength!)
- It can be executed in a targeted, light-weight fashion. (Strength!)
- It’s less likely to slip into deception than option 2, and less confrontational than option 1. (Strength!)
- Even if it works, they won’t end up thinking exactly what you think (Weakness?)
- ….but they’ll be better equipped to make sense of new evidence (Strength!)
- Plausibly more mimetically fit than option 1 or 2 (a failure mode of 1 is that your interlocutor won’t be empowered to stand up to criticism while spreading the ideas, even to people very much like themselves, and for option 2 it’s that they will ONLY be successful in spreading the idea to people who are like themselves, since they only know the argument that works on them).
I think Eliezer has talked about some version of this in the past, and this is part of why people like predictions in general, but I think pasting a prediction at the end of an argument built around strategy 1 or 2 isn't actually Doing The Thing I mean here.
Friends report Logan's writing strongly has this property.
Do you think of rationality as a similar sort of 'object' or 'discipline' to philosophy? If not, what kind of object do you think of it as being?
(I am no great advocate for academic philosophy; I left that shit way behind ~a decade ago after going quite a ways down the path. I just want to better understand whether folks consider Rationality as a replacement for philosophy, a replacement for some of philosophy, a subset of philosophical commitments, a series of cognitive practices, or something else entirely. I can model it, internally, as aiming to be any of these things, without other parts of my understanding changing very much, but they all have 'gaps', where there are things that I associate with Rationality that don't actually naturally fall out of the core concepts as construed as any of these types of category [I suppose this is the 'being a subculture' x-factor]).
Question for Ben:
Are you inviting us to engage with the object level argument, or are you drawing attention to the existence of this argument from a not-obviously-unreasonable-source as a phenomenon we are responsible for (and asking us to update on that basis)?
On my read, he’s not saying anything new (concerns around military application are why ‘we’ mostly didn’t start going to the government until ~2-3 years ago), but that he’s saying it, while knowing enough to paint a reasonable-even-to-me picture of How This Thing Is Going, is the real tragedy.
I think the reason nobody will do anything useful-to-John as a result of the control critique post is that control is explicitly not aiming at the hard parts of the problem, and knows this about itself. In that way, control is an especially poorly selected target if the goal is getting people to do anything useful-to-John. I'd be interested in a similar post on the Alignment Faking paper (or model organisms more broadly), on RAT, on debate, on faithful CoT, on specific interpretability paradigms (circuits v SAEs, vs some coherentist approach vs shards vs....), and would expect those to have higher odds of someone doing something useful-to-John. But useful-to-John isn't really the metric I think the field should be using, either....
I'm kind of picking on you here because you are least guilty of this failing relative to researchers in your reference class. You are actually saying anything at all, sometimes with detail, about how you feel about particular things. However, you wouldn't be my first-pick judge for what's useful; I'd rather live in a world where like half a dozen people in your reference class are spending non-zero time arguing about the details of the above agendas and how they interface with your broader models, so that the researchers working on those things can update based on those critiques (there may even be ways for people to apply the vector implied by y'all's collective input, and generate something new / abandon their doomed plans).
there are plenty of cases where we can look at what people are doing and see pretty clearly that it is not progress toward the hard problem
There are plenty of cases where John can glance at what people are doing and see pretty clearly that it is not progress toward the hard problem.
Importantly, people with the agent foundations class of anxieties (which I embrace; I think John is worried about the right things!) do not spend time engaging on a gears level with prominent prosaic paradigms and connecting the high level objection ("it ignores the hard part of the problem") with the details of the research.
"But Tsvi and John actually spend a lot of time doing this."
No, they don't! They paraphrase the core concern over and over again, often seemingly without reading the paper. I don't think reading the paper would change your minds (nor should it!), but I think that there's a culture problem tied to this off-hand dismissal of prosaic work that disincentivizes potential agent foundations (or similar new thing that shares the core concerns of agent foundations) researchers from engaging with, i.e., John.
Prosaic work is fraught and, much of it, doomed. New researchers over-index on tractability because short feedback loops are comforting ('street-lighting'). Why aren't we explaining why that is, on the terms of the research itself, rather than expecting people to be persuaded by the same high level point getting hammered into them again and again?
I've watched this work in real-time. If you listen to someone talk about their work, or read their paper and follow up in person, they are often receptive to a conversation about worlds in which their work is ineffective, evidence that we're likely to be in such a world, and even to shifting the direction of their work in recognition of that evidence.
Instead, people with their eye on the ball are doing this tribalistic(-seeming) thing.
Yup, the deck is stacked against humanity solving the hard problems; for some reason, folks who know that are also committed to playing their hands poorly, and then blaming (only) the stacked deck!
John's recent post on control is a counter-example to the above claims and was, broadly, a big step in the right direction, but had some issues with it, as raised by Redwood in the comments, which are a natural consequence of it being ~a new thing John was doing. I look forward to more posts like that in the future, from John and others, that help new entrants to empirical work (which has a robust talent pipeline!) understand, integrate, and even pivot in response to, the hard parts of the problem.
[edit: I say 'gears level' a couple times, but mean 'more in the direction of gears-level than the critiques that have existed so far']
If you wrote this exact post, it would have been upvoted enough for the Redwood team to see it, and they would have engaged with you similarly to how they engaged with John here (modulo some familiarity, because theyse people all know each other at least somewhat, and in some pairs very well actually).
If you wrote several posts like this, that were of some quality, you would lose the ability to appeal to your own standing as a reason not to write a post.
This is all I'm trying to transmit.
[edit: I see you already made the update I was encouraging, an hour after leaving the above comment to me. Yay!]
Writing (good) critiques is, in fact, a way many people gain standing. I’d push back on the part of you that thinks all of your good ideas will be ignored (some of them probably will be, but not all of them; don’t know until you try, etc).
More partial credit on the second to last point:
https://home.treasury.gov/news/press-releases/jy2766
Aside: I don’t think it’s just that real world impacts take time to unfold. Lately I’ve felt that evals are only very weakly predictive of impact (because making great ones is extremely difficult). Could be that models available now don’t have substantially more mundane utility (economic potential stemming from first order effects), outside of the domains the labs are explicitly targeting (like math and code), than models available 1 year ago.
Is the context on “reliable prediction and ELK via empirical route” just “read the existing ELK literature and actually follow it” or is it stuff that’s not written down? I assume you’ve omitted it to save time, and so no worries if the latter.
EDIT: I was slightly tempted to think of this also as ‘Ryan’s ranking of live agendas that aren’t control’, but I’m not sure if ‘what you expect to work conditional on delegating to AIs’ is similar to ‘what you expect to work if humans are doing most of it?’ (my guess is the lists would look similar, but with notable exceptions, eg humans pursuing GOFAI feels less viable than ML agents pursuing GOFAI)
My understanding is that ~6 months ago y’all were looking for an account of the tasks an automated AI safety researcher would hopefully perform, as part of answering the strategic question ‘what’s the next step after building [controlled] AGI?’ (with ‘actually stop there indefinitely’ being a live possibility)
This comment makes me think you’ve got that account of safety tasks to be automated, and are feeling optimistic about automated safety research.
Is that right and can you share a decently mechanistic account of how automated safety research might work?
[I am often skeptical of, to straw man the argument, ‘make ai that makes ai safe’, got the sense Redwood felt similarly, and now expect this may have changed.]
Thanks for the clarification — this is in fact very different from what I thought you were saying, which was something more like "FATE-esque concerns fundamentally increase x-risk in ways that aren't just about (1) resource tradeoffs or (2) side-effects of poorly considered implementation details."
Anthropic should take a humanist/cosmopolitan stance on risks from AGI in which risks related to different people having different values are very clearly deprioritized compared to risks related to complete human disempowerment or extinction, as worry about the former seems likely to cause much of the latter
Can you say more about the section I've bolded or link me to a canonical text on this tradeoff?
[was a manager at MATS until recently and want to flesh out the thing Buck said a bit more]
It’s common for researchers to switch subfields, and extremely common for MATS scholars to get work doing something different from what they did at MATS. (Kosoy has had scholars go on to ARC, Neel scholars have ended up in scalable oversight, Evan’s scholars have a massive spread in their trajectories; there are many more examples but it’s 3 AM.)
Also I wouldn’t advise applying to something that seems interesting; I’d advise applying for literally everything (unless you know for sure you don’t want to work with Neel, since his app is very time intensive). The acceptance rate is ~4 percent, so better to maximize your odds (again, for most scholars, the bulk of the value is not in their specific research output over the 10 week period, but in having the experience at all).
Also please see Ryan’s replies to Tsvi on the talent needs report for more notes on the street lighting concern as it pertains to MATS. There’s a pretty big back and forth there (I don’t cleanly agree with one side or the other, but it might be useful to you).
Your version of events requires a change of heart (for 'them to get a whole lot more serious'). I'm just looking at the default outcome. Whether alignment is hard or easy (although not if it's totally trivial), it appears to be progressing substantially more slowly than capabilities (and the parts of it that are advancing are the most capabilities-synergizing, so it's unclear what the oft-lauded 'differential advancement of safety' really looks like).
By bad I mean dishonest, and by 'we' I mean the speaker (in this case, MIRI).
I take myself to have two central claims across this thread:
- Your initial comment was straw manning the 'if we build [ASI], we all die' position.
- MIRI is likely not a natural fit to consign itself to service as the neutral mouthpiece of scientific consensus.
I do not see where your most recent comment has any surface area with either of these claims.
I do want to offer some reassurance, though:
I do not take "One guy who's thought about this for a long time and some other people he recruited think it's definitely going to fail" to be descriptive of the MIRI comms strategy.
Oh, I feel fine about saying ‘draft artifacts currently under production by the comms team ever cite someone who is not Eliezer, including experts with a lower p(doom)’ which, based on this comment, is what I take to be the goalpost. This is just regular coalition signaling though and not positioning yourself as, terminally, a neutral observer of consensus.
“You haven’t really disagreed that [claiming to speak for scientific consensus] would be more effective.”
That’s right! I’m really not sure about this. My experience has been that ~every take someone offers to normies in policy is preceded by ‘the science says…’, so maybe the market is kind of saturated here. I’d also worry that precommitting to only argue in line with the consensus might bind you to act against your beliefs (and I think EY et al have valuable inside-view takes that shouldn’t be stymied by the trends of an increasingly-confused and poisonous discourse). That something is a local credibility win (I’m not sure if it is, actually) doesn’t mean it’s got the best nth order effects among all options long-term (including on the dimension of credibility).
I believe that Seth would find messaging that did this more credible. I think ‘we’re really not sure’ is a bad strategy if you really are sure, which MIRI leadership, famously, is.
I do mean ASI, not AGI. I know Pope + Belrose also mean to include ASI in their analysis, but it’s still helpful to me if we just use ASI here, so I’m not constantly wondering if you’ve switched to thinking about AGI.
Obligatory ‘no really, I am not speaking for MIRI here.’
My impression is that MIRI is not trying to speak for anyone else. Representing the complete scientific consensus is an undue burden to place on an org that has not made that claim about itself. MIRI represents MIRI, and is one component voice of the ‘broad view guiding public policy’, not its totality. No one person or org is in the chair with the lever; we’re all just shouting what we think in directions we expect the diffuse network of decision-makers to be sitting in, with more or less success. It’s true that ‘claiming to represent the consensus’ is a tacking one can take to appear authoritative, and not (always) a dishonest move. To my knowledge, this is not MIRI’s strategy. This is the strategy of, ie, the CAIS letter (although not of CAIS as a whole!), and occasionally AIS orgs cite expert consensus or specific, otherwise-disagreeing experts as having directional agreement with the org (for an extreme case, see Yann LeCun shortening his timelines). This is not the same as attempting to draw authority from the impression that one’s entire aim is simply ‘sharing consensus.’
And then my model of Seth says ‘Well we should have an org whose entire strategy is gathering and sharing expert consensus, and I’m disappointed that this isn’t MIRI, because this is a better strategy,’ or else cites a bunch of recent instances of MIRI claiming to represent scientific consensus (afaik these don’t exist, but it would be nice to know if they do). It is fair for you to think MIRI should be doing a different thing. Imo MIRI’s history points away from it being a good fit to take representing scientific consensus as its primary charge (and this is, afaict, part of why AI Impacts was a separate project).
I think MIRI comms are by and large well sign-posted to indicate ‘MIRI thinks x’ or ‘Mitch thinks y’ or ‘Bengio said z.’ If you think a single org should build influence and advocate for a consensus view then help found one, or encourage someone else to do so. This just isn’t what MIRI is doing.
Good point - what I said isn’t true in the case of alignment by default.
Edited my initial comment to reflect this
(I work at MIRI but views are my own)
I don't think 'if we build it we all die' requires that alignment be hard [edit: although it is incompatible with alignment by default]. It just requires that our default trajectory involves building ASI before solving alignment (and, looking at our present-day resource allocation, this seems very likely to be the world we are in, conditional on building ASI at all).
[I want to note that I'm being very intentional when I say "ASI" and "solving alignment" and not "AGI" and "improving the safety situation"]
What text analogizing LLMs to human brains have you found most compelling?
Does it seem likely to you that, conditional on ‘slow bumpy period soon’, a lot of the funding we see at frontier labs dries up (so there’s kind of a double slowdown effect of ‘the science got hard, and also now we don’t have nearly the money we had to push global infrastructure and attract top talent’), or do you expect that frontier labs will stay well funded (either by leveraging low hanging fruit in mundane utility, or because some subset of their funders are true believers, or a secret third thing)?
Only the first few sections of the comment were directed at you; the last bit was a broader point re other commenters in the thread, the fooming shoggoths, and various in-person conversations I’ve had with people in the bay.
That rationalists and EAs tend toward aesthetic bankruptcy is one of my chronic bones to pick, because I do think it indicates the presence of some bias that doesn’t exist in the general population, which results in various blind spots.
Sorry for not signposting and/or limiting myself to a direct reply; that was definitely confusing.
I think you should give 1 or 2 a try, and would volunteer my time (although if you’d find a betting structure more enticing, we could say my time is free iff I turn out to be wrong, and otherwise you’d pay me).
If this is representative of the kind of music you like, I think you’re wildly overestimating how difficult it is to make that music.
The hard parts are basically infrastructural (knowing how to record a sound, how to make different sounds play well together in a virtual space). Suno is actually pretty bad at that, though, so if you give yourself the affordance to be bad at it, too, then you can just ignore the most time-intensive part of music making.
Pasting things together (as you did here), is largely The Way Music Is Made in the digital age, anyway.
I think, in ~one hour you could:
- Learn to play the melody of this song on piano.
- Learn to use some randomizer tools within a DAW (some of which may be ML-based), and learn the fundamentals of that DAW, as well as just enough music theory to get by (nothing happening in the above Suno piece would take more than 10 minutes of explanation to understand).
The actual arrangement of the Suno piece is somewhat ambitious (not in that it does anything hard, just in that it has many sections), but this was the part you had to hack together yourself anyway, and getting those features in a human-made song is more about spending the time to do it, than it is about having the skill (there is a skill to doing an awesome job of it, but Suno doesn’t have that skill either).
Suno’s outputs are detectably bad to me and all of my music friends, even the e/acc or ai-indifferent ones, and it’s a significant negative update for me on the broader perceptual capacities of our community that so many folks here prefer Suno to music made by humans.
A great many tools like this already exist and are contracted by the major labels.
When you post a song to streaming services, it’s checked against the entire major label catalog before actually listing on the service (the technical process is almost certainly not literally this, but it’s something like this, and they’re very secretive about what’s actually happening under the hood).
Cool! I think we're in agreement at a high level. Thanks for taking the extra time to make sure you were understood.
In more detail, though:
I think I disagree with 1 being all that likely; there are just other things I could see happening that would make a pause or stop politically popular (i.e. warning shots, An Inconvenient Truth AI Edition, etc.), likely not worth getting into here. I also think 'if we pause it will be for stupid reasons' is a very sad take.
I think I disagree with 2 being likely, as well; probably yes, a lot of the bottleneck on development is ~make-work that goes away when you get a drop-in replacement for remote workers, and also yes, AI coding is already an accelerant // effectively doing gradient descent on gradient descent (RLing the RL'd researcher to RL the RL...) is intelligence-explosion fuel. But I think there's a big gap between the capabilities you need for politically worrisome levels of unemployment, and the capabilities you need for an intelligence explosion, principally because >30 percent of human labor in developed nations could be automated with current tech if the economics align a bit (hiring 200+k/year ML engineers to replace your 30k/year call center employee is only just now starting to make sense economically). I think this has been true of current tech since ~GPT-4, and that we haven't seen a concomitant massive acceleration in capabilities on the frontier (things are continuing to move fast, and the proliferation is scary, but it's not an explosion).
I take "depending on how concentrated AI R&D is" to foreshadow that you'd reply to the above with something like: "This is about lab priorities; the labs with the most impressive models are the labs focusing the most on frontier model development, and they're unlikely to set their sights on comprehensive automation of shit jobs when they can instead double-down on frontier models and put some RL in the RL to RL the RL that's been RL'd by the..."
I think that's right about lab priorities. However, I expect the automation wave to mostly come from middle-men, consultancies, what have you, who take all of the leftover ML researchers not eaten up by the labs and go around automating things away individually (yes, maybe the frontier moves too fast for this to be right, because the labs just end up with a drop-in remote worker 'for free' as long as they keep advancing down the tech tree, but I don't quite think this is true, because human jobs are human-shaped, and buyers are going to want pretty rigorous role-specific guarantees from whoever's selling this service, even if they're basically unnecessary, and the one-size-fits-all solution is going to have fewer buyers than the thing marketed as 'bespoke').
In general, I don't like collapsing the various checkpoints between here and superintelligence; there are all these intermediate states, and their exact features matter a lot, and we really don't know what we're going to get. 'By the time we'll have x, we'll certainly have y' is not a form of prediction that anyone has a particularly good track record making.
So for this argument to be worth bringing up in some general context where a pause is discussed, the person arguing it should probably believe:
- We are far and away most likely to get a pause only as a response to unemployment.
- An AI that precipitates pause-inducing levels of unemployment is inches from automating AI R+D.
- The period between implementing the pause and massive algorithmic advancements is long enough that we're able to increase compute stock...
- ....but short enough that we're not able to make meaningful safety progress before algorithmic advancements make the pause ineffective (because, i.e., we regulated FLOPS and it just now takes 100x fewer FLOPS to build the dangerous thing).
I think the conjunct probability of all these things is low, and I think their likelihood is sensitive to the terms of the pause agreement itself. I agree that the design of a pause should consider a broad range of possibilities, and try to maximize its own odds of attaining its ends (Keep Everyone Alive).
I'm also not sure how this goes better in the no-pause world? Unless this person also has really high odds on multipolar going well and expects some Savior AI trained and aligned in the same length of time as the effective window of the theoretical pause to intervene? But that's a rare position among people who care about safety ~at all; it's kind of a George Hotz take or something...
(I don't think we disagree; you did flag that this as "...somewhat relevant in worlds where..." which is often code for "I really don't expect this to happen, but Someone Somewhere should hold this possibility in mind." Just want to make sure I'm actually following!)
I've just read this post and the comments. Thank you for writing that; some elements of the decomposition feel really good, and I don't know that they've been done elsewhere.
I think discourse around this is somewhat confused, because you actually have to do some calculation on the margin, and need a concrete proposal to do that with any confidence.
The straw-Pause rhetoric is something like "Just stop until safety catches up!" The overhang argument is usually deployed (as it is in those comments) to the effect of 'there is no stopping.' And yeah, in this calculation, there are in fact marginal negative externalities to the implementation of some subset of actions one might call a pause. The straw-Pause advocate really doesn't want to look at that, because it's messy to entertain counter-evidence to your position, especially if you don't have a concrete enough proposal on the table to assign weights in the right places.
Because it's so successful against straw-Pausers, the anti-pause people bring in the overhang argument like an absolute knockdown, when it's actually just a footnote to double check the numbers and make sure your pause proposal avoids slipping into some arcane failure mode that 'arms' overhang scenarios. That it's received as a knockdown is reinforced by the gearsiness of actually having numbers (and most of these conversations about pauses are happening in the abstract, in the absence of, i.e., draft policy).
But... just because your interlocutor doesn't have the numbers at hand, doesn't mean you can't have a real conversation about the situations in which compute overhang takes on sufficient weight to upend the viability of a given pause proposal.
You said all of this much more elegantly here:
Arguments that overhangs are so bad that they outweigh the effects of pausing or slowing down are basically arguing that a second-order effect is more salient than the first-order effect. This is sometimes true, but before you've screened this consideration off by examining the object-level, I think your prior should be against.
...which feels to me like the most important part. The burden is on folks introducing an argument from overhang risk to prove its relevance within a specific conversation, rather than just introducing the adversely-gearsy concept to justify safety-coded accelerationism and/or profiteering. Everyone's prior should be against actions Waluigi-ing, by default (while remaining alert to the possibility!).
I think it would be very helpful to me if you broke that sentence up a bit more. I took a stab at it but didn't get very far.
Sorry for my failure to parse!
I want to say yes, but I think this might be somewhat more narrow than I mean. It might be helpful if you could list a few other ways one might read my message, that seem similarly-plausible to this one.
Folks using compute overhang to 4D chess their way into supporting actions that differentially benefit capabilities.
I'm often tempted to comment this in various threads, but it feels like a rabbit hole, it's not an easy one to convince someone of (because it's an argument they've accepted for years), and I've had relatively little success talking about this with people in person (there's some change I should make in how I'm talking about it, I think).
More broadly, I've started using quick takes to catalog random thoughts, because sometimes when I'm meeting someone for the first time, they have heard of me, and are mistaken about my beliefs, but would like to argue against their straw version. Having a public record I can point to of things I've thought feels useful for combatting this.
Yes this world.
Please stop appealing to compute overhang. In a world where AI progress has wildly accelerated chip manufacture, this already-tenuous argument has become ~indefensible.
Sometimes people express concern that AIs may replace them in the workplace. This is (mostly) silly. Not that it won't happen, but you've gotta break some eggs to make an industrial revolution. This is just 'how economies work' (whether or not they can / should work this way is a different question altogether).
The intrinsic fear of joblessness-resulting-from-automation is tantamount to worrying that curing infectious diseases would put gravediggers out of business.
There is a special case here, though: double digit unemployment (and youth unemployment, in particular) is a major destabilizing force in politics. You definitely don't want an automation wave so rapid that the jobless and nihilistic youth mount a civil war, sharply curtailing your ability to govern the dangerous technologies which took everyone's jobs in the first place.
As AI systems become more expensive, and more powerful, and pressure to deploy them profitably increases, I'm fairly concerned that we'll see a massive hollowing out of many white collar professions, resulting in substantial civil unrest, violence, chaos. I'm not confident that we'll get (i.e.) a UBI (or that it would meaningfully change the situation even if we did), and I'm not confident that there's enough inertia in existing economic structures to soften the blow.
The IMF estimates that current tech (~GPT 4 at launch) can automate ~30% of human labor performed in the US. That's a big, scary number. About half of these, they imagine, are the kinds of things you always want more of anyway, and that this complementarity will just drive production in that 15% of cases. The other 15%, though, probably just stop existing as jobs altogether (for various reasons, I think a 9:1 replacement rate is more likely than full automation, with current tech).
This mostly isn't happening yet because you need an ML engineer to commit Serious Time to automating away That Job In Particular. ML engineers are expensive, and usually not specced for the kind of client-facing work that this would require (i.e. breaking down tasks that are part of a job, knowing what parts can be automated, and via what mechanisms, be that purpose-built models, fine-tuning, a prompt library for a human operator, some specialized scaffolding...). There's just a lot of friction and lay-speak necessary to accomplish this, and it's probably not economically worth it for some subset of necessary parties (ML engineers can make more elsewhere than small business owners can afford to pay them to automate things away, for instance).
So we've got a bottleneck, and on the other side of it, this speculative 15% leap in unemployment. That 15% potential leap, though, is climbing as capabilities increase (this is tautologically true; "drop in replacement for a remote worker" is one major criteria used in discussions about AI progress).
I don't expect 15% unemployment to destabilize the government (Great Depression peak was 25%, which is a decent lower bound on 'potentially dangerous' levels of unemployment in the US). But I do expect that 15% powder keg to grow in size, and potentially cross into dangerous territory before it's lit.
Previously, I'd actually arrived at that 30% number myself (almost exactly one year ago), but I had initially expected:
- Labs would devote substantial resources to this automation, and it would happen more quickly than it has so far.
- All of these jobs were just on the chopping block (frankly, I'm not sure how much I buy the complementarity argument, but I am An Internet Crank, and they are the International Monetary Fund, so I'll defer to them).
These two beliefs made the situation look much more dire than I now believe it to be, but it's still, I claim, worth entertaining as A Way This Whole Thing Could Go, especially if we're hitting a capabilities plateau, and especially if we're doubling down on government intervention as our key lever in obviating x-risk.
[I'm not advocating for a centrally planned automation schema, to be clear; I think these things have basically never worked, but would like to hear counterexamples. Maybe just like... a tax on automation to help staunch the flow of resources into the labs and their surrogates, a restructuring of unemployment benefits and retraining programs and, before any of that, a more robust effort to model the economic consequences of current and future systems than the IMF report that just duplicates the findings of some idiot (me) frantically reviewing BLS statistics in the winter of 2023.]
I (and maybe you) have historically underrated the density of people with religious backgrounds in secular hubs. Most of these people don't 'think differently', in a structural sense, from their forebears; they just don't believe in that God anymore.
The hallmark here is a kind of naive enlightenment approach that ignores ~200 years of intellectual history (and a great many thinkers from before that period, including canonical philosophers they might claim to love/respect/understand). This type of thing.
They're no less tribal or dogmatic, or more critical, than the place they came from. They just vote the other way and can maybe talk about one or two levels of abstraction beyond the stereotype they identify against (although they can't really think about those levels).
You should still be nice to them, and honest with them, but you should understand what you're getting into.
The mere biographical detail of having a religious background or being religious isn't a strong mark against someone's thinking on other topics, but it is a sign you may be talking to a member of a certain meta-intellectual culture, and need to modulate your style. I have definitely had valuable conversations with people that firmly belong in this category, and would not categorically discourage engagement. Just don't be so surprised when the usual jutsu falls flat!
I agree with this in the world where people are being epistemically rigorous/honest with themselves about their timelines and where there's a real consensus view on them. I've observed that it's pretty rare for people to make decisions truly grounded in their timelines, or to do so only nominally, and I think there's a lot of social signaling going on when (especially younger) people state their timelines.
I appreciate that more experienced people are willing to give advice within a particular frame ("if timelines were x", "if China did y", "if Anthropic did z", "If I went back to school", etc etc), even if they don't agree with the frame itself. I rely on more experienced people in my life to offer advice of this form ("I'm not sure I agree with your destination, but admit there's uncertainty, and love and respect you enough to advise you on your path").
Of course they should voice their disagreement with the frame (and I agree this should happen more for timelines in particular), but to gate direct counsel on urgent, object-level decisions behind the resolution of background disagreements is broadly unhelpful.
When someone says "My timelines are x, what should I do?", I actually hear like three claims:
- Timelines are x
- I believe timelines are x
- I am interested in behaving as though timelines are x
Evaluation of the first claim is complicated and other people do a better job of it than I do so let's focus on the others.
"I believe timelines are x" is a pretty easy roll to disbelieve. Under relatively rigorous questioning, nearly everyone (particularly everyone 'career-advice-seeking age') will either say they are deferring (meaning they could just as easily defer to someone else tomorrow), or admit that it's a gut feel, especially for their ~90 percent year, and especially for more and more capable systems (this is more true of ASI than weak AGI, for instance, although those terms are underspecified). Still others will furnish 0 reasoning transparency and thus reveal their motivations to be principally social (possibly a problem unique to the bay, although online e/acc culture has a similar Thing).
"I am interested in behaving as though timelines are x" is an even easier roll to disbelieve. Very few people act on their convictions in sweeping, life-changing ways without concomitant benefits (money, status, power, community), including people within AIS (sorry friends).
With these uncertainties, piled on top of the usual uncertainties surrounding timelines, I'm not sure I'd want anyone to act so nobly as to refuse advice to someone with different timelines.
If Alice is a senior AIS professional who gives advice to undergrads at parties in Berkeley (bless her!), how would her behavior change under your recommendation? It sounds like maybe she would stop fostering a diverse garden of AIS saplings and instead become the awful meme of someone who just wants to fight about a highly speculative topic. Seems like a significant value loss.
Their timelines will change some other day; everyone's will. In the meantime, being equipped to talk to people with a wide range of safety-concerned views (especially for more senior, or just Older people), seems useful.
harder to converge
Converge for what purpose? It feels like the marketplace of ideas is doing an ok job of fostering a broad portfolio of perspectives. If anything, we are too convergent and, as a consequence, somewhat myopic internally. Leopold mind-wormed a bunch of people until Tegmark spoke up (and that only somewhat helped). Few thought governance was a good idea until pretty recently (~3 years ago), and it would be going better if those interested in the angle weren't shouted down so emphatically to begin with.
If individual actors need to cross some confidence threshold in order to act, but the reasonable confidence interval is in fact very wide, I'd rather have a bunch of actors with different timelines, which roughly sum to the shape of the reasonable thing*, then have everyone working on the same overconfident assumption that later comes back to bite us (when we've made mistakes in the past, this is often why).
*Which is, by the way, closer to flat than most people's individual timelines
I don't think I really understood what it meant for establishment politics to be divisive until this past election.
As good as it feels to sit on the left and say "they want you to hate immigrants" or "they want you to hate queer people", it seems similarly (although probably not equally?) true that the center left also has people they want you to hate (the religious, the rich, the slightly-more-successful-than-you, the ideologically-impure-who-once-said-a-bad-thing-on-the-internet).
But there's also a deeper, structural sense in which it's true.
Working on AIS, I've long hoped that we could form a coalition with all of the other people worried about AI, because a good deal of them just.. share (some version of) our concerns, and our most ambitious policy solutions (e.g. stopping development, mandating more robust interpretability and evals) could also solve a bunch of problems highlighted by the FATE community, the automation-concerned, etc etc.
Their positions also have the benefit of conforming to widely-held anxieties ('I am worried AI will just be another tool of empire', 'I am worried I will lose my job for banal normie reasons that have nothing to do with civilizational robustness', 'I am worried AI's will cheaply replace human labor and do a worse job, enshittifying everything in the developed world'). We could generally curry popular support and favor, without being dishonest, by looking at the Venn diagram of things we want and things they want (which would also help keep AI policy from sliding into partisanship, if such a thing is still possible, given the largely right-leaning associations of the AIS community*).
For the next four years, at the very least, I am forced to lay this hope aside. That the EO contained language in service of the FATE community was, in hindsight, very bad, and probably foreseeably so, given that even moderate Republicans like to score easy points on culture war bullshit. Probably it will be revoked, because language about bias made it an easy thing for Vance to call "far left".
"This is ok because it will just be replaced."
Given the current state of the game board, I don't want to be losing any turns. We've already lost too many turns; setbacks are unacceptable.
"What if it gets replaced by something better?"
I envy your optimism. I'm also concerned about the same dynamic playing out in reverse; what if the new EO (or piece of legislation via whatever mechanism), like the old EO, contains some language that is (to us) beside the point, but nonetheless signals partisanship, and is retributively revoked or repealed by the next administration? This is why you don't want AIS to be partisan; partisanship is dialectics without teleology.
Ok, so structurally divisive: establishment politics has made it ~impossible to form meaningful coalitions around issues other than absolute lightning rods (e.g. abortion, immigration; the 'levers' available to partisan hacks looking to gin up donations). It's not just that they make you hate your neighbors, it's that they make you behave as though you hate your neighbors, lest your policy proposals get painted with the broad red brush and summarily dismissed.
I think this is the kind of observation that leads many experienced people interested in AIS to work on things outside of AIS, but with an eye toward implications for AI (e.g. Critch, A Ray). You just have these lucid flashes of how stacked the deck really is, and set about digging the channel that is, compared to the existing channels, marginally more robust to reactionary dynamics ('aligning the current of history with your aims' is maybe a good image).
Hopefully undemocratic regulatory processes serve their function as a backdoor for the sensible, but it's unclear how penetrating the partisanship will be over the next four years (and, of course, those at the top are promising that it will be Very Penetrating).
*I am somewhat ambivalent about how right-leaning AIS really is. Right-leaning compared to middle class Americans living in major metros? Probably. Tolerant of people with pretty far-right views? Sure, to a point. Right of the American center as defined in electoral politics (e.g. 'Republican-voting')? Usually not.
I think the key missing piece you’re pointing at (making sure that our interpretability tools etc actually tell us something alignment-relevant) is one of the big things going on in model organisms of misalignment (iirc there’s a step that’s like ‘ok, but if we do interpretability/control/etc at the model organism does that help?’). Ideally this type of work, or something close to it, could become more common // provide ‘evals for our evals’ // expand in scope and application beyond deep deception.
If that happened, it seems like it would fit the bill here.
Does that seem true to you?
I like this post but I think redwood has varied some on whether control is for getting alignment work out of AIs vs getting generally good-for-humanity work out of them and pushing for a pause once they reach some usefulness/danger threshold (eg well before super intelligence).
[based on my recollection of Buck seminar in MATS 6]
Makes sense. Pretty sure you can remove it (and would appreciate that).
Many MATS scholars go to Anthropic (source: I work there).
Redwood I’m really not sure, but that could be right.
Sam now works at Anthropic.
Palisade: I’ve done some work for them, I love them, I don’t know that their projects so far inhibit Anthropic (BadLlama, which I’m decently confident was part of the cause for funding them, was pretty squarely targeted at Meta, and is their most impactful work to date by several OOM). In fact, the softer versions of Palisade’s proposal (highlighting misuse risk, their core mission), likely empower Anthropic as seemingly the most transparent lab re misuse risks.
I take the thrust of your comment to be “OP funds safety, do your research”. I work in safety; I know they fund safety.
I also know most safety projects differentially benefit Anthropic (this fact is independent of whether you think differentially benefiting Anthropic is good or bad).
If you can make a stronger case for any of the other of the dozens of orgs on your list than exists for the few above, I’d love to hear it. I’ve thought about most of them and don’t see it, hence why I asked the question.
Further: the goalpost is not ‘net positive with respect to TAI x-risk.’ It is ‘not plausibly a component of a meta-strategy targeting the development of TAI at Anthropic before other labs.’
Edit: use of the soldier mindset flag above is pretty uncharitable here; I am asking for counter-examples to a hypothesis I’m entertaining. This is the actual opposite of soldier mindset.
updated, thanks!
The CCRU is under-discussed in this sphere as a direct influence on the thoughts and actions of key players in AI and beyond.
Land started a creative collective, alongside Mark Fisher, in the 90s. I learned this by accident, and it seems like a corner of intellectual history that’s at least as influential as ie the extropians.
If anyone knows of explicit connections between the CCRU and contemporary phenomena (beyond Land/Fisher’s immediate influence via their later work), I’d love to hear about them.
Does anyone have examples of concrete actions taken by Open Phil that point toward their AIS plan being anything other than ‘help Anthropic win the race’?
I think a non-zero number of those disagree votes would not have appeared if the same comment were made by someone other than an Anthropic employee, based on seeing how Zac is sometimes treated IRL. My comment is aimed most directly at the people who cast those particular disagree votes.
I agree with your comment to Ryan above that those who identified "Anthropic already does most of these" as "the central part of the comment" were using the disagree button as intended.
The threshold for hitting the button will be different in different situations; I think the threshold many applied here was somewhat low, and a brief look at Zac's comment history, to me, further suggests this.