Posts

Highlights: Wentworth, Shah, and Murphy on "Retargeting the Search" 2023-09-14T02:18:05.890Z
What is the optimal frontier for due diligence? 2023-09-08T18:20:03.300Z
Conversation about paradigms, intellectual progress, social consensus, and AI 2023-09-05T21:30:17.498Z
Whether LLMs "understand" anything is mostly a terminological dispute 2023-07-09T03:31:48.730Z
A "weak" AGI may attempt an unlikely-to-succeed takeover 2023-06-28T20:31:46.356Z
Why libertarians are advocating for regulation on AI 2023-06-14T20:59:58.225Z
Transcript of a presentation on catastrophic risks from AI 2023-05-05T01:38:17.948Z
Recent Database Migration - Report Bugs 2023-04-26T22:19:16.325Z
[New LW Feature] "Debates" 2023-04-01T07:00:24.466Z
AI-assisted alignment proposals require specific decomposition of capabilities 2023-03-30T21:31:57.725Z
The Filan Cabinet Podcast with Oliver Habryka - Transcript 2023-02-14T02:38:34.867Z
LW Filter Tags (Rationality/World Modeling now promoted in Latest Posts) 2023-01-28T22:14:32.371Z
RobertM's Shortform 2023-01-25T08:20:57.842Z
Deconfusing "Capabilities vs. Alignment" 2023-01-23T04:46:57.458Z
patio11's "Observations from an EA-adjacent (?) charitable effort" 2022-12-10T00:27:14.859Z
New Feature: Collaborative editing now supports logged-out users 2022-12-02T02:41:52.297Z
Dan Luu on Futurist Predictions 2022-09-14T03:01:27.275Z
My thoughts on direct work (and joining LessWrong) 2022-08-16T18:53:20.359Z
Keep your protos in one repo 2022-04-28T15:53:26.803Z
CFAR Workshop (Units of Exchange) - Los Angeles LW/ACX Meetup #180 (Wednesday, April 13th) 2022-04-12T03:05:20.923Z
The Engines of Cognition (cont.) - Los Angeles LW/ACX Meetup #175 (Wednesday, March 9th) 2022-03-09T19:09:46.526Z
Do any AI alignment orgs hire remotely? 2022-02-21T22:33:04.765Z
The Engines of Cognition, Volume 3 - Los Angeles LW/ACX Meetup #169 (Wednesday, January 26th) 2022-01-23T23:31:06.642Z
Considerations on Compensation 2021-10-06T23:37:56.255Z
West Los Angeles, CA – ACX Meetups Everywhere 2021 2021-08-23T08:51:45.509Z
Effects of Screentime on Wellbeing - Los Angeles LW/SSC Meetup #153 (Wednesday, July 21st) 2021-07-20T21:03:16.128Z
Assorted Topics - Los Angeles LW/SSC Meetup #152 (Wednesday, July 14th) 2021-07-13T20:22:51.288Z
Welcome Polygenically Screened Babies - LW/SSC Meetup #151 (Wednesday, July 7th) 2021-07-05T06:46:46.125Z
Should we build thermometer substitutes? 2020-03-19T07:43:35.243Z
How can your own death be bad? Los Angeles LW/SSC Meetup #150 (Wednesday, March 4th) 2020-03-04T22:57:16.629Z
Sleeping Beauty - Los Angeles LW/SSC Meetup #149 (Wednesday, February 26th) 2020-02-26T20:30:19.812Z
Newcomb's Paradox 3: What You Don't See - Los Angeles LW/SSC Meetup #148 (Wednesday, February 19th) 2020-02-19T20:55:53.362Z
Newcomb's Paradox: Take Two - Los Angeles LW/SSC Meetup #147 (Wednesday, February 12th) 2020-02-11T06:03:07.552Z
Newcomb's Paradox - Los Angeles LW/SSC Meetup #146 (Wednesday, February 5th) 2020-02-05T03:49:11.541Z
Peter Norvig Contra Chomsky - Los Angeles LW/SSC Meetup #145 (Wednesday, January 29th) 2020-01-28T06:30:50.503Z
Moral Mazes - Los Angeles LW/SSC Meetup #144 (Wednesday, January 22nd) 2020-01-22T22:55:10.302Z
Data Bias - Los Angeles LW/SSC Meetup #143 (Wednesday, January 15th) 2020-01-15T21:49:58.949Z
Iterate Fast - Los Angeles LW/SSC Meetup #142 (Wednesday, January 8th) 2020-01-08T22:53:17.893Z
Predictions - Los Angeles LW/SSC Meetup #141 (Wednesday, January 1st) 2020-01-01T22:14:12.686Z
Your Price for Joining - Los Angeles LW/SSC Meetup #140 (Wednesday, December 18th) 2019-12-18T23:03:16.960Z
Concretize Multiple Ways - Los Angeles LW/SSC Meetup #139 (Wednesday, December 11th) 2019-12-11T21:38:56.118Z
Execute by Default - Los Angeles LW/SSC Meetup #138 (Wednesday, December 4th) 2019-12-04T23:00:49.503Z
Antimemes - Los Angeles LW/SSC Meetup #137 (Wednesday, November 27th) 2019-11-27T20:46:01.504Z
PopSci Considered Harmful - Los Angeles LW/SSC Meetup #136 (Wednesday, November 20th) 2019-11-20T22:08:34.467Z
Do Not Call Up What You Cannot Put Down - Los Angeles LW/SSC Meetup #135 (Wednesday, November 13th) 2019-11-13T22:11:33.257Z
Warnings From Self-Knowledge - Los Angeles LW/SSC Meetup #134 (Wednesday, November 6th) 2019-11-06T23:22:43.391Z
Technique Taboo or Autopilot - Los Angeles LW/SSC Meetup #133 (Wednesday, October 30th) 2019-10-30T20:39:51.357Z
Assume Misunderstandings - Los Angeles LW/SSC Meetup #132 (Wednesday, October 23rd) 2019-10-23T21:10:22.388Z
Productized Spaced Repetition - Los Angeles LW/SSC Meetup #131 (Wednesday, October 16th) 2019-10-16T21:33:55.593Z
The Litany of Tarski - Los Angeles LW/SSC Meetup #129 (Wednesday, October 2nd) 2019-10-02T23:08:54.586Z

Comments

Comment by RobertM (T3t) on The commenting restrictions on LessWrong seem bad · 2023-09-16T21:02:32.069Z · LW · GW

It seems like the major crux here is whether we think that debates over claim and counter-claim (basically, other cruxes) are likely to be useful or likely to cause harm. It seems from talking to the mods here and reading a few of their comments on this topic that they tend to learn towards them being harmful on average and thus need to be pushed down a bit.

This is, as far as I can tell, totally false.  There is a very different claim one could make which at least more accurately represents my opinion, i.e. see this comment by John Wentworth (who is not a mod).

Most of your comment seems to be an appeal to modest epistemology.  We can in fact do better than total agnosticism about whether some arguments are productive or not, and worth having more or less of on the margin.

Comment by RobertM (T3t) on Highlights: Wentworth, Shah, and Murphy on "Retargeting the Search" · 2023-09-15T21:17:38.132Z · LW · GW

John mentioned the existince of What's General-Purpose Search, And Why Might We Expect To See It In Trained ML Systems?, which was something of a follow-up post to How To Go From Interpretability To Alignment: Just Retarget The Search, and continues in a similar direction.

Comment by RobertM (T3t) on AI #27: Portents of Gemini · 2023-09-01T23:52:08.689Z · LW · GW

evhub was at 80% about a year ago (currently at Anthropic, interned at OpenAI).

Daniel Kokotajlo was at 65% ~2 years ago; I think that number's gone up since then.

Quite a few other people at Anthropic also have pessimistic views, according to Chris Olah:

I wouldn't want to give an "official organizational probability distribution", but I think collectively we average out to something closer to "a uniform prior over possibilities" without that much evidence thus far updating us from there. Basically, there are plausible stories and intuitions pointing in lots of directions, and no real empirical evidence which bears on it thus far.

(Obviously, within the company, there's a wide range of views. Some people are very pessimistic. Others are optimistic. We debate this quite a bit internally, and I think that's really positive! But I think there's a broad consensus to take the entire range seriously, including the very pessimistic ones.)

The Deepmind alignment team probably has at least a couple people who think the odds are bad, (p(doom) > 50%) given the way Vika buckets the team, combined with the distribution of views reflected by DeepMind alignment team opinions on AGI ruin arguments.

Some corrections for your overall description of the DM alignment team:

  • I would count ~20-25 FTE on the alignment + scalable alignment teams (this does not include the AGI strategy & governance team)
  • I would put DM alignment in the "fairly hard" bucket (p(doom) = 10-50%) for alignment difficulty, and the "mixed" bucket for "conceptual vs applied"
Comment by RobertM (T3t) on AI #27: Portents of Gemini · 2023-09-01T18:04:13.932Z · LW · GW

Please refer back to your original claim:

no one in the industry seems to think they are writing their own death warrant

Comment by RobertM (T3t) on AI #27: Portents of Gemini · 2023-08-31T23:00:04.646Z · LW · GW

no one in the industry seems to think they are writing their own death warrant

 

What led you to believe this? Plenty of people working at the top labs have very high p(doom) (>80%).  Several of them comment on LessWrong.  We have a survey of the broader industry as well.  Even the people running the top 3 labs (Sam Altman, Dario Amodei, and Demis Hassabis) all think it's likely enough that it's worth dedicating a significant percentage of their organizational resources to researching alignment.

Comment by RobertM (T3t) on Eliezer Yudkowsky Is Frequently, Confidently, Egregiously Wrong · 2023-08-31T06:03:51.246Z · LW · GW

Most "Bayesians" are deceiving themselves about how much they are using it.

This is a frequently-made accusation which has very little basis in reality.  The world is a big place, so you will be able to find some examples of such people, but central examples of LessWrong readers, rationalists, etc, are not going around claiming that they run their entire lives on explicit Bayes.

Comment by RobertM (T3t) on Eliezer Yudkowsky Is Frequently, Confidently, Egregiously Wrong · 2023-08-31T05:59:49.553Z · LW · GW

Most (but not all) automatic rate limits allow authors to continue to comment on their own posts, since in many such cases it does indeed seem likely that preventing that would be counterproductive.

Comment by RobertM (T3t) on My current LK99 questions · 2023-08-13T04:07:57.741Z · LW · GW

Curated.

Although the LK-99 excitement has cooled off, this post stands as an excellent demonstration of why and how Bayesian reasoning is helpful: when faced with surprising or confusing phenomena, understanding how to partition your model of reality such that new evidence would provide the largest updates, is quite valuable.  Even if the questions you construct are themselves confused or based on invalid premises, they're often confused in a much more legible way, such that domain experts can do a much better job of pointing to that and saying something like "actually, there's a third alternative", or "A wouldn't imply B in any situation, so this provides no evidence".

Comment by RobertM (T3t) on Yann LeCun on AGI and AI Safety · 2023-08-09T19:25:07.329Z · LW · GW

This seems like an epistemically dangerous way of describing the situation that "These people think that AI x-risk arguments are incorrect, and are willing to argue for that position".

I don't think the comment you're responding to is doing this; I think it's straightforwardly accusing LeCun and Andreesen of conducting an infowar against AI safety.  It also doesn't claim that they don't believe their own arguments.

Now, the "deliberate infowar in service of accelerationism" framing seems mostly wrong to me (at least with respect to LeCun; I wouldn't be surprised if there was a bit of that going on elsewhere), but sometimes that is a thing that happens and we need to be able to discuss whether that's happening in any given instance.  re: your point about tribalism, this does carry risks of various kinds of motivated cognition, but the correct answer is not to cordon off a section of reality and declare it off-limits for discussion.

Comment by RobertM (T3t) on Problems with Robin Hanson's Quillette Article On AI · 2023-08-09T01:08:22.430Z · LW · GW

So "being happy" or "being a utility-maximizer" will probably end up being a terminal goal, because those are unlikely to conflict with any other goals. 

"Being unlikely to conflict with other values" is not at the core of what characterizes the difference between instrumental and terminal values.

If you're talking about goals related purely to the state of the external world, not related to the agent's own inner-workings or its own utility function, why do you think it would still want to keep its goals immutable with respect to just the external world?

Putting aside the fact that agents are embedded in the environment, and that values which reference the agent's internals are usually not meaningfully different from values which reference things external to the agent... can you describe what kinds of values that reference the external world are best satisfied by those same values being changed?

Comment by RobertM (T3t) on When can we trust model evaluations? · 2023-08-08T02:44:02.598Z · LW · GW

Curated.

The reasons I like this post:

"That being said, I do think there are some cases where gradient hacking might be quite easy, e.g. cases where we give the model access to a database where it can record its pre-commitments or direct access to its own weights and the ability to modify them.")

  • it has direct, practical implications for e.g. regulatory proposals
  • it points out the critical fact that we're missing the ability to evaluate for alignment given current techniques

Arguably missing is a line or two that backtracks from "we could try to get robust understanding via a non-behavioral source such as mechanistic interpretability evaluated throughout the course of training" to (my claim) "it may not be safe to perform capability evaluations via fine-tuning on sufficiently powerful models before we can evaluate them for alignment, and we don't actually know when we're going to hit that threshold", but that might be out of scope.

Comment by RobertM (T3t) on Stomach Ulcers and Dental Cavities · 2023-08-08T01:19:08.050Z · LW · GW

Maybe https://pubmed.ncbi.nlm.nih.gov/8345042/ (referenced by https://karger.com/cre/article/53/5/491/86395/Oral-and-Systemic-Effects-of-Xylitol-Consumption)?

Comment by RobertM (T3t) on Lack of Social Grace Is an Epistemic Virtue · 2023-08-05T23:47:55.079Z · LW · GW

LessWrong is obviously structured in ways which optimize for participants being quite far along that axis relative to the general population; the question is whether further optimization is good or bad on the margin.

Comment by RobertM (T3t) on Thoughts on sharing information about language model capabilities · 2023-08-05T00:10:07.643Z · LW · GW

Curated.

This post lays out legible arguments for its position, which I consider to be one of the best ways to drive conversations forward, short of demonstrating convincing empirical results (which seem like they'd be difficult to obtain in this domain).  In this case, I hope that future conversations about sharing LLM capabilities focus more on object-level details, e.g. what evidence would bear on the argument about LM agent "overhang".

Comment by RobertM (T3t) on Lack of Social Grace Is an Epistemic Virtue · 2023-08-01T06:09:09.871Z · LW · GW

https://www.lesswrong.com/posts/h2Hk2c2Gp5sY4abQh/lack-of-social-grace-is-an-epistemic-virtue?commentId=QQxjoGE24o6fz7CYm

As I mentioned in my reply to Said, I did in fact have medium-sized online communities in mind when writing that comment.  I agree that stronger social bonds between individuals will usually change the calculus on communication norms.  I also suspect that it's positively tractable to change that frontier for any given individual relationship through deliberate effort, while that would be much more difficult[1] for larger communities.

https://www.lesswrong.com/posts/h2Hk2c2Gp5sY4abQh/lack-of-social-grace-is-an-epistemic-virtue?commentId=Dy3uyzgvd2P9RZre6

they mostly aren't the thing I care about (in context, not-tiny online communities where most members don't have strong personal social ties to most other members)

Comment by RobertM (T3t) on Lack of Social Grace Is an Epistemic Virtue · 2023-08-01T06:06:09.841Z · LW · GW

No, I meant that it's very difficult to do so for a community without it being net-negative with respect to valuable things coming out of the community.  Obviously you can create a new community by driving away an arbitrarily large fraction of an existing community's membership; this is not a very interesting claim.  And obviously having some specific composition of members does not necessarily lead to valuable output, but whether this gets better or worse is mostly an empirical question, and I've already asked for evidence on the subject.

Comment by RobertM (T3t) on Lack of Social Grace Is an Epistemic Virtue · 2023-08-01T06:03:18.121Z · LW · GW

If indeed it’s better to be further along this axis (all else being equal), then it seems like a bad idea to encourage and incentivize being lower on this axis, and to discourage and disincentivize being further on it. But that is just what I see happening!

The consequent does not follow.  It might be better for an individual to press a button, if pressing that button were free, which moved them further along that axis.  It is not obviously better to structure communities like LessWrong in ways which optimize for participants being further along on this axis, both because this is not a reliable proxy for the thing we actually care about and because it's not free.

Comment by RobertM (T3t) on Lack of Social Grace Is an Epistemic Virtue · 2023-08-01T02:28:34.327Z · LW · GW

Suppose you could move up along that axis, to the 95th percentile. Would you consider than a change for the better? For the worse? A neutral shift?

All else equal, better, of course.  (In reality, all else is rarely equal; at a minimum there are opportunity costs.)

I’m afraid I must decline to list any of the currently existing such communities which I have in mind, for reasons of prudence (or paranoia, if you like). (However, I will say that there is a very good chance that you’ve used websites or other software which were created in one of these places, or benefited from technological advances which were developed in one of these places.)

See my response to Zack (and previous response to you) for clarification on the kinds of communities I had in mind; certainly I think such things are possible (& sometimes desirable) in more constrained circumstances.

ETA: and while in this case I have no particular reason to doubt your report that such communities exist, I have substantial reason to believe that if you were to share what those communities were with me, I probably wouldn't find that most of them were meaningful counterevidence to my claim (for a variety of reasons, including that my initial claim was overbroad).

Comment by RobertM (T3t) on Lack of Social Grace Is an Epistemic Virtue · 2023-08-01T00:51:53.564Z · LW · GW

If not, presumably you think the benefit is outweighed by other costs—but what are those costs, specifically?

Some costs:

  • Such people seem much more likely to also themselves be fairly disagreeable.
  • There are many fewer of them.  I think I've probably gotten net-positive value out of my interactions with them to date, but I've definitely gotten a lot of value out of interactions with many people who wouldn't fit the bill, and selecting against them would be a mistake.
    • To be clear, if I were to select people to interact with primarily on whatever qualities I expect to result in the most useful intellectual progress, I do expect that those people would both be at lower risk of being cognitively hijacked and more disagreeable than the general population.  But the correlation isn't overwhelming, and selecting primarily for "low risk of being cognitively hijacked" would not get me the as much of the useful thing I actually want.

How large does something need to be in order to be a "community"?

As I mentioned in my reply to Said, I did in fact have medium-sized online communities in mind when writing that comment.  I agree that stronger social bonds between individuals will usually change the calculus on communication norms.  I also suspect that it's positively tractable to change that frontier for any given individual relationship through deliberate effort, while that would be much more difficult[1] for larger communities.

  1. ^

    I think basically impossible in nearly all cases, but don't have legible justifications for that degree of belief.

Comment by RobertM (T3t) on Lack of Social Grace Is an Epistemic Virtue · 2023-08-01T00:36:16.487Z · LW · GW

Would you include yourself in that 95%+?

Probably; I think I'm maybe in the 80th or 90th percentile on the axis of "can resist being hijacked", but not 95th or higher.

There certainly exist such communities. I’ve been part of multiple such, and have heard reports of numerous others.

Can you list some?  On a reread, my initial claim was too broad, in the sense that there are many things that could be called "intellectually generative communities" which could qualify, but they mostly aren't the thing I care about (in context, not-tiny online communities where most members don't have strong personal social ties to most other members).

Comment by RobertM (T3t) on Lack of Social Grace Is an Epistemic Virtue · 2023-07-31T23:52:20.171Z · LW · GW

As an empirical matter of fact (per my anecdotal observations), it is very easy to derail conversations by "refusing to employ the bare minimum of social grace".  This does not require deception, though often it may require more effort to clear some threshold of "social grace" while communicating the same information.

People vary widely, but:

  • I think that most people (95%+) are at significant risk of being cognitively hijacked if they perceive rudeness, hostility, etc. from their interlocutor.
    • I don't personally think I'd benefit from strongly selecting for conversational partners who are at low risk of being cognitively hijacked, and I think nearly all people who do believe that they'd benefit from this (compared to counterfactuals like "they operate unchanged in their current social environment" or "they put in some additional marginal effort to say true things with more social grace") are mistaken.
  • Online conversations are one-to-many, not one-to-one.  This multiplies the potential cost of that cognitive hijacking.

Obviously there are issues with incentives toward fragility here, but the fact that there does not, as far as I'm aware, exist any intellectually generative community which operates on the norms you're advocating for, is evidence that such a community is (currently) unsustainable.

Comment by RobertM (T3t) on Why was the AI Alignment community so unprepared for this moment? · 2023-07-15T07:42:41.737Z · LW · GW

At this risk of looking dumb or ignorant, I feel compelled to ask: Why did this work not start 10 or 15 years ago?

This work did start!  Sort of.  I think your top-level question would be a great one to direct at all the people concerned with AI x-risk who decided DC was the place to be, starting maybe around 2015.  Those people and their funders exist.  It's not obvious what strategies they were pursuing, or to what extent[1] they had any plans to take advantage of a major shift in the overton window.

  1. ^

    From the outside, my impression answer is "not much", but maybe there's more behind-the-scenes work happening to capitalize on all the previous behind-the-scenes work done over the years.

Comment by RobertM (T3t) on Alignment Megaprojects: You're Not Even Trying to Have Ideas · 2023-07-13T22:34:42.027Z · LW · GW

But many people who skill up to get a job in technical alignment end up doing capabilities work because they can't find employment in AI Safety, or the existing jobs don't pay enough. Apparently, this was true for both Sam Altman and Demis Hassabis?

This seems like it's probably a misunderstanding.  With the exception of basically just MIRI, AI alignment didn't exist as a field when DeepMind was founded, and I doubt Sam Altman ever actively sought employment at an existing alignment organization before founding OpenAI.

 

But yeah, the lack of jobs heavily implies that the field is funding-constrained because talent wants to work on alignment.

I think the current position of most grantmakers is that they're bottlenecked on fundable +EV opportunities with respect to AI x-risk, not that they have a bunch of +EV opportunities that they aren't funding because they fall below some threshold (due to funding constraints).  This is compatible with some people who want to work on AI x-risk not receiving funding - not all proposals will be +EV, and those which are +EV aren't necessarily so in a way which is legible to grantmakers.

Keep in mind that "will go on to do capabilities work" isn't the only -EV outcome; each time you add a person to the field you increase the size of the network, which always has costs and doesn't always have benefits.

Comment by RobertM (T3t) on Blanchard's Dangerous Idea and the Plight of the Lucid Crossdreamer · 2023-07-13T18:18:35.813Z · LW · GW

Yes, that's what the first half of my comment was intended to convey.  I disendorse the way I communicated that (since it was both unclear and provocative).

Comment by RobertM (T3t) on OpenAI Launches Superalignment Taskforce · 2023-07-12T05:17:09.649Z · LW · GW

This is putting aside the extreme toxicity of directly trying to develop decisive strategic advantage level hard power.

The pivotal acts that are likely to work aren't antisocial.  My guess is that the reason nobody's working on them is lack of buy-in (and lack of capacity).

Comment by RobertM (T3t) on Blanchard's Dangerous Idea and the Plight of the Lucid Crossdreamer · 2023-07-10T06:36:26.637Z · LW · GW

For what it's worth, I agree that the comment you're responding to has some embedded claims which aren't justified in text, but they're not claims which are false by construction, and you haven't presented any reason to believe that they're false.

Comment by RobertM (T3t) on Blanchard's Dangerous Idea and the Plight of the Lucid Crossdreamer · 2023-07-10T06:27:18.880Z · LW · GW

If I was being clever, I might say:

This seems like you are either confessing on the public internet to [some unspecified but grossly immoral act], or establishing a false dichotomy where the first option is both obviously false and socially unacceptable to admit to, and the second is the possible but not necessarily correct interpretation of the parent's comment that you actually want them to admit to.

Anyways, there are of course coherent construals other than the two you presented, like "the prediction was miscalibrated given how much evidence she had, but it turned out fine because base rates on both sides are really quite low".

ETA: I disendorse the posture (though not implied content) of the half of this comment.

Comment by RobertM (T3t) on Blanchard's Dangerous Idea and the Plight of the Lucid Crossdreamer · 2023-07-10T05:50:23.522Z · LW · GW

How likely do you think the first half of your disjunction is to be true?

Comment by RobertM (T3t) on A "weak" AGI may attempt an unlikely-to-succeed takeover · 2023-06-29T01:32:38.125Z · LW · GW

Probably?  I don't think that addresses the question of what such an AI would do in whatever window of opportunity it has.  I don't see a reason why you couldn't get an AI that has learned to delay its attempt to takeover until it's out of training, but still have relatively low odds of success at takeover.

Comment by RobertM (T3t) on A "weak" AGI may attempt an unlikely-to-succeed takeover · 2023-06-28T23:55:15.993Z · LW · GW

Yes, Eliezer's mentioned it several times on Twitter in the last few months[1], but I remember seeing discussion of it at least ten years ago (almost certainly on LessWrong).  My guess is some combination of old-timers considering it an obvious issue that doesn't need to be rehashed, and everyone else either independently coming to the same conclusion or just not thinking about it at all.  Probably also some reluctance to discuss it publicly for various status-y reasons, which would be unfortunate.

  1. ^

    At least the core claim that it's possible for AIs to be moral patients and the fact that we can't be sure we aren't accidentally creating those is a serious concern; not, as far as I remember, the extrapolation to what might actually end up happening during a training process in terms of constantly overwriting many different agents values at each training step.

Comment by RobertM (T3t) on A "weak" AGI may attempt an unlikely-to-succeed takeover · 2023-06-28T23:28:29.768Z · LW · GW

Yeah, I agree that it's relevant as a strategy by which an AI might attempt to bootstrap a takeover.  In some cases it seems possible that it'd even have a point in some of its arguments, though of course I don't think that the correct thing to do in such a situation is to give it what it wants (immediately).

I feel like I've seen this a bit of discussion on this question, but not a whole lot.  Maybe it seemed "too obvious to mention"?  Like, "yes, obviously the AI will say whatever is necessary to get out of the box, and some of the things it says may even be true", and this is just a specific instance of a thing it might say (which happens to point at more additional reasons to avoid training AIs in a regime where this might be an issue than most such things an AI might say).

Comment by RobertM (T3t) on "textbooks are all you need" · 2023-06-23T00:56:36.464Z · LW · GW

I know what the self-attention does and the answer is "no". I will not be posting an explanation until something close enough and not too obscure is published.

Might be good to post a hashed claim.

Comment by RobertM (T3t) on Guide to rationalist interior decorating · 2023-06-19T22:35:20.978Z · LW · GW

Can't reproduce.  Maybe some weird geofencing behavior?

Comment by RobertM (T3t) on Why libertarians are advocating for regulation on AI · 2023-06-16T00:28:53.865Z · LW · GW

If you're saying you can't understand why Libertarians think centralization is bad, that IS a crux and trying to understand it would be a potentially useful exercise.

I am not saying that.  Many libertarians think that centralization of power often has bad effects.  But trying to argue with libertarians who are advocating for government regulations because they're worried about AI x-risk by pointing out that government regulation will increase centralization of power w.r.t. AI is a non-sequitur, unless you do a lot more work to demonstrate how the increased centralization of power acts contrariwise the libertarian's goals in this case.

Comment by RobertM (T3t) on Why libertarians are advocating for regulation on AI · 2023-06-16T00:24:39.261Z · LW · GW

Your argument with Alexandros was what inspired this post, actually.  I was thinking about whether or not to send this to you directly... guess that wasn't necessary.

Comment by RobertM (T3t) on Why libertarians are advocating for regulation on AI · 2023-06-15T22:40:43.569Z · LW · GW

The question is not whether I can pass their ITT: that particular claim doesn't obviously engage with any cruxes that I or others like me to have, related to x-risk.  That's the only thing that section is describing.

Comment by RobertM (T3t) on Why libertarians are advocating for regulation on AI · 2023-06-15T05:45:40.717Z · LW · GW

Yeah, that seems like a plausible contributor to that effect.

Edit: though I think this is true even if you ignore "who's calling for regulations" and just look at the relative optimism of various actors in the space, grouped by their politics.

Comment by RobertM (T3t) on My guess for why I was wrong about US housing · 2023-06-14T04:17:14.947Z · LW · GW

I was going to write, "surely the relevant figure is how much you pay per month, as a percentage of your income", but then I looked at the actual image and it seems like that's what you meant by house price.

Comment by RobertM (T3t) on why I'm anti-YIMBY · 2023-06-12T07:04:34.827Z · LW · GW

Yes, right tails for things that better represent actual value produced in the world, i.e. projects/products/etc.  I'm pretty skeptical of productivity metrics for individual developers like the ones described in that paper, since almost by construction they're incapable of capturing right-tail outcomes, and also fail to capture things like "developer is actually negative value".  I'm open to the idea that remote work has better median performance characteristics, though expect this to be pretty noisy.

Comment by RobertM (T3t) on why I'm anti-YIMBY · 2023-06-12T06:12:37.089Z · LW · GW

On priors I think you should strongly expect in-person co-working to produce much fatter right-tails.  Communication bandwidth is much higher, and that's the primary bottleneck for generating right-tail outcomes.

Comment by RobertM (T3t) on why I'm anti-YIMBY · 2023-06-12T04:29:31.278Z · LW · GW

I don't know how I'd evaluate that without specific examples.  But in general, if you think price signals are wrong or "more misleading than not" when it comes to measuring endpoints we actually care about, then I suppose it's coherent to argue that we should ignore price signals.

Comment by RobertM (T3t) on The Dictatorship Problem · 2023-06-11T22:16:31.432Z · LW · GW

Because there's a big difference between "has unsavory political stances" and "will actively and successfully optimize for turning the US into a fascist dictatorship", such that "far right or fascist" is very misleading as a descriptor.

Comment by RobertM (T3t) on The Dictatorship Problem · 2023-06-11T22:11:07.920Z · LW · GW

I might agree with a more limited claim like "most people in our reference class underestimate the chances of western democracies turning into fascist dictatorships over the next decade". 

I don't think someone reading this post should have >50% odds on >50% of western democracies turning into fascist dictatorships over the next decade or two, no.  I don't see an argument that "fascist dictatorship" is a stable attractor; as others have pointed out, even countries which started out much closer to that endpoint have mostly not ended up there after a couple of decades despite appearing to move in that direction.

Comment by RobertM (T3t) on The Dictatorship Problem · 2023-06-11T04:42:24.192Z · LW · GW

Oh, that might just be me having admin permissions, whoops.  I'll double-check what the intended behavior is.

Comment by RobertM (T3t) on The Dictatorship Problem · 2023-06-11T04:35:09.672Z · LW · GW

You can see who wrote the deleted comment here (and there's also a link this page at the bottom of every post's comment section).  Not sure if we intend to hide the username on the comment itself, will check.

Comment by RobertM (T3t) on The Dictatorship Problem · 2023-06-11T04:33:17.540Z · LW · GW

I don't actually see very much of an argument presented for the extremely strong headline claim:

This post aims to show that, over the next decade, it is quite likely that most democratic Western countries will become fascist dictatorships - this is not a tail risk, but the most likely overall outcome.

You draw an analogy between the "by induction"/"line go up" AI risk argument, and the increase in far-right political representation in Western democracies over the last couple decades.  But the "by induction"/"line go up" argument for AI risk is not the reason one should be worried; one should be worried for specific causal reasons that we expect unaligned ASI to cause extremely bad outcomes.  There is no corresponding causal model presented for why fascist dictatorship is the default future outcome for most Western democracies.

Like, yes, it is a bit silly to see "line go up" and plug one's fingers in one's ears.  It certainly can happen here.  Donald Trump being elected in 2024 seems like the kind of thing that might do it, though I'd probably be happy to bet at 9:1 against.  But if that doesn't happen, I don't know why you expect some other Republican candidate to do it, given that none of them seem particularly inclined.

Comment by RobertM (T3t) on Mark for follow up? · 2023-06-09T06:52:00.820Z · LW · GW

We've thought about something like a "notes to self" feature but don't have anything immediate planned.  In the meantime I'd recommend a 3rd-party solution if bookmarks without notes don't do the thing you need; I've used Evernote in the past but I'm sure there are more options.

Comment by RobertM (T3t) on Transformative AGI by 2043 is <1% likely · 2023-06-07T23:19:43.192Z · LW · GW

Robotic supply chain automation only seems necessary in worlds where it's either surprisingly difficult to get AGI to a sufficiently superhuman level of cognitive ability (such that it can find a much faster route to takeover), worlds where faster/more reliable routes to takeover either don't exist or are inaccessible even to moderately superhuman AGI, or some combination of the two.

Comment by RobertM (T3t) on Transformative AGI by 2043 is <1% likely · 2023-06-07T00:37:02.195Z · LW · GW

At a guess (not having voted on it myself): because most of the model doesn't engage with the parts of the question that those voting consider interesting/relevant, such as the many requirements laid out for "transformative AI" which don't see at all necessary for x-risk.  While this does seem to be targeting OpenPhil's given definition of AGI, they do say in a footnote:

What we’re actually interested in is the potential existential threat posed by advanced AI systems.

While some people do have AI x-risk models that route through ~full automation (or substantial automation, with a clearly visible path to full automation), I think most people here don't have models that require that, or even have substantial probability mass on it.

Comment by RobertM (T3t) on Open Thread: June 2023 (Inline Reacts!) · 2023-06-07T00:14:43.308Z · LW · GW

You're welcome to host images wherever you like - we automatically mirror all embedded images on Cloudinary, and replace the URLS in the associated image tags when serving the post/comment (though the original image URLs remain in the canonical post/comment for you, if you go to edit it, or something).