Posts

Take the wheel, Shoggoth! (Lesswrong is trying out changes to the frontpage algorithm) 2024-04-23T03:58:43.443Z
The LessWrong 2022 Review: Review Phase 2023-12-22T03:23:49.635Z
Highlights: Wentworth, Shah, and Murphy on "Retargeting the Search" 2023-09-14T02:18:05.890Z
What is the optimal frontier for due diligence? 2023-09-08T18:20:03.300Z
Conversation about paradigms, intellectual progress, social consensus, and AI 2023-09-05T21:30:17.498Z
Whether LLMs "understand" anything is mostly a terminological dispute 2023-07-09T03:31:48.730Z
A "weak" AGI may attempt an unlikely-to-succeed takeover 2023-06-28T20:31:46.356Z
Why libertarians are advocating for regulation on AI 2023-06-14T20:59:58.225Z
Transcript of a presentation on catastrophic risks from AI 2023-05-05T01:38:17.948Z
Recent Database Migration - Report Bugs 2023-04-26T22:19:16.325Z
[New LW Feature] "Debates" 2023-04-01T07:00:24.466Z
AI-assisted alignment proposals require specific decomposition of capabilities 2023-03-30T21:31:57.725Z
The Filan Cabinet Podcast with Oliver Habryka - Transcript 2023-02-14T02:38:34.867Z
LW Filter Tags (Rationality/World Modeling now promoted in Latest Posts) 2023-01-28T22:14:32.371Z
RobertM's Shortform 2023-01-25T08:20:57.842Z
Deconfusing "Capabilities vs. Alignment" 2023-01-23T04:46:57.458Z
patio11's "Observations from an EA-adjacent (?) charitable effort" 2022-12-10T00:27:14.859Z
New Feature: Collaborative editing now supports logged-out users 2022-12-02T02:41:52.297Z
Dan Luu on Futurist Predictions 2022-09-14T03:01:27.275Z
My thoughts on direct work (and joining LessWrong) 2022-08-16T18:53:20.359Z
Keep your protos in one repo 2022-04-28T15:53:26.803Z
CFAR Workshop (Units of Exchange) - Los Angeles LW/ACX Meetup #180 (Wednesday, April 13th) 2022-04-12T03:05:20.923Z
The Engines of Cognition (cont.) - Los Angeles LW/ACX Meetup #175 (Wednesday, March 9th) 2022-03-09T19:09:46.526Z
Do any AI alignment orgs hire remotely? 2022-02-21T22:33:04.765Z
The Engines of Cognition, Volume 3 - Los Angeles LW/ACX Meetup #169 (Wednesday, January 26th) 2022-01-23T23:31:06.642Z
Considerations on Compensation 2021-10-06T23:37:56.255Z
West Los Angeles, CA – ACX Meetups Everywhere 2021 2021-08-23T08:51:45.509Z
Effects of Screentime on Wellbeing - Los Angeles LW/SSC Meetup #153 (Wednesday, July 21st) 2021-07-20T21:03:16.128Z
Assorted Topics - Los Angeles LW/SSC Meetup #152 (Wednesday, July 14th) 2021-07-13T20:22:51.288Z
Welcome Polygenically Screened Babies - LW/SSC Meetup #151 (Wednesday, July 7th) 2021-07-05T06:46:46.125Z
Should we build thermometer substitutes? 2020-03-19T07:43:35.243Z
How can your own death be bad? Los Angeles LW/SSC Meetup #150 (Wednesday, March 4th) 2020-03-04T22:57:16.629Z
Sleeping Beauty - Los Angeles LW/SSC Meetup #149 (Wednesday, February 26th) 2020-02-26T20:30:19.812Z
Newcomb's Paradox 3: What You Don't See - Los Angeles LW/SSC Meetup #148 (Wednesday, February 19th) 2020-02-19T20:55:53.362Z
Newcomb's Paradox: Take Two - Los Angeles LW/SSC Meetup #147 (Wednesday, February 12th) 2020-02-11T06:03:07.552Z
Newcomb's Paradox - Los Angeles LW/SSC Meetup #146 (Wednesday, February 5th) 2020-02-05T03:49:11.541Z
Peter Norvig Contra Chomsky - Los Angeles LW/SSC Meetup #145 (Wednesday, January 29th) 2020-01-28T06:30:50.503Z
Moral Mazes - Los Angeles LW/SSC Meetup #144 (Wednesday, January 22nd) 2020-01-22T22:55:10.302Z
Data Bias - Los Angeles LW/SSC Meetup #143 (Wednesday, January 15th) 2020-01-15T21:49:58.949Z
Iterate Fast - Los Angeles LW/SSC Meetup #142 (Wednesday, January 8th) 2020-01-08T22:53:17.893Z
Predictions - Los Angeles LW/SSC Meetup #141 (Wednesday, January 1st) 2020-01-01T22:14:12.686Z
Your Price for Joining - Los Angeles LW/SSC Meetup #140 (Wednesday, December 18th) 2019-12-18T23:03:16.960Z
Concretize Multiple Ways - Los Angeles LW/SSC Meetup #139 (Wednesday, December 11th) 2019-12-11T21:38:56.118Z
Execute by Default - Los Angeles LW/SSC Meetup #138 (Wednesday, December 4th) 2019-12-04T23:00:49.503Z
Antimemes - Los Angeles LW/SSC Meetup #137 (Wednesday, November 27th) 2019-11-27T20:46:01.504Z
PopSci Considered Harmful - Los Angeles LW/SSC Meetup #136 (Wednesday, November 20th) 2019-11-20T22:08:34.467Z
Do Not Call Up What You Cannot Put Down - Los Angeles LW/SSC Meetup #135 (Wednesday, November 13th) 2019-11-13T22:11:33.257Z
Warnings From Self-Knowledge - Los Angeles LW/SSC Meetup #134 (Wednesday, November 6th) 2019-11-06T23:22:43.391Z
Technique Taboo or Autopilot - Los Angeles LW/SSC Meetup #133 (Wednesday, October 30th) 2019-10-30T20:39:51.357Z
Assume Misunderstandings - Los Angeles LW/SSC Meetup #132 (Wednesday, October 23rd) 2019-10-23T21:10:22.388Z

Comments

Comment by RobertM (T3t) on RobertM's Shortform · 2024-04-18T20:43:10.187Z · LW · GW

I think there might be many local improvements, but I'm pretty uncertain about important factors like elasticity of "demand" (for robbery) with respect to how much of a medication is available on demand.  i.e. how many fewer robberies do you get if you can get at most a single prescriptions' worth of some kind of controlled substance (and not necessarily any specific one), compared to "none" (the current situation) or "whatever the pharmacy has in stock" (not actually sure if this was the previous situation - maybe they had time delay safes for storing medication that wasn't filling a prescription, and just didn't store the filled prescriptions in the safes as well)?

Comment by RobertM (T3t) on RobertM's Shortform · 2024-04-18T03:49:15.923Z · LW · GW

Headline claim: time delay safes are probably much too expensive in human time costs to justify their benefits.

The largest pharmacy chains in the US, accounting for more than 50% of the prescription drug market[1][2], have been rolling out time delay safes (to prevent theft)[3].  Although I haven't confirmed that this is true across all chains and individual pharmacy locations, I believe these safes are used for all controlled substances.  These safes open ~5-10 minutes after being prompted.

There were >41 million prescriptions dispensed for adderall in the US in 2021[4].  (Note that likely means ~12x fewer people were prescribed adderall that year.)   Multiply that by 5 minutes and you get >200 million minutes, or >390 person-years, wasted.  Now, surely some of that time is partially recaptured by e.g. people doing their shopping while waiting, or by various other substitution effects.  But that's also just adderall!

Seems quite unlikely that this is on the efficient frontier of crime-prevention mechanisms, but alas, the stores aren't the ones (mostly) paying the costs imposed by their choices, here.

  1. ^

    https://www.mckinsey.com/industries/healthcare/our-insights/meeting-changing-consumer-needs-the-us-retail-pharmacy-of-the-future

  2. ^

    https://www.statista.com/statistics/734171/pharmacies-ranked-by-rx-market-share-in-us/

  3. ^

    https://www.cvshealth.com/news/pharmacy/cvs-health-completes-nationwide-rollout-of-time-delay-safes.html

  4. ^

    https://www.axios.com/2022/11/15/adderall-shortage-adhd-diagnosis-prescriptions

Comment by RobertM (T3t) on General Thoughts on Secular Solstice · 2024-03-24T00:29:58.460Z · LW · GW

use spaces that your community already has (Lighthaven?), even if they're not quite set up the right way for them

Not set up the right way would be an understatement, I think.  Lighthaven doesn't have an indoor space which can seat several hundred people, and trying to do it outdoors seems like it'd require solving maybe-intractable logistical problems (weather, acoustics, etc).  (Also Lighthaven was booked, and it's not obvious to me to what degree we'd want to subsidize the solstice celebration.  It'd also require committing a year ahead of time, since most other suitable venues are booked up for the holidays quite far in advance.)

I don't think there are other community venues that could host the solstice celebration for free, but there might be opportunities for cheaper (or free) venues outside the community (with various trade-offs).

Comment by RobertM (T3t) on "How could I have thought that faster?" · 2024-03-18T04:49:30.795Z · LW · GW

Having said that, I would NOT describe this as asking "how could I have arrived at the same destination by a shorter route". I would just describe it as asking "what did I learn here, really".

I mean, yeah, they're different things.  If you can figure out how to get to the correct destination faster next time you're trying to figure something out, that seems obviously useful.

Comment by RobertM (T3t) on Matthew Barnett's Shortform · 2024-03-08T02:05:26.970Z · LW · GW

Some related thoughts.   I think the main issue here is actually making the claim of permanent shutdown & deletion credible.  I can think of some ways to get around a few obvious issues, but others (including moral issues) remain, and in any case the current AGI labs don't seem like the kinds of organizations which can make that kind of commitment in a way that's both sufficiently credible and legible that the remaining probability mass on "this is actually just a test" wouldn't tip the scales.

Comment by RobertM (T3t) on Many arguments for AI x-risk are wrong · 2024-03-05T07:33:32.768Z · LW · GW

I am not covering training setups where we purposefully train an AI to be agentic and autonomous. I just think it's not plausible that we just keep scaling up networks, run pretraining + light RLHF, and then produce a schemer.[2]

Like Ryan, I'm interested in how much of this claim is conditional on "just keep scaling up networks" being insufficient to produce relevantly-superhuman systems (i.e. systems capable of doing scientific R&D better and faster than humans, without humans in the intellectual part of the loop).  If it's "most of it", then my guess is that accounts for a good chunk of the disagreement.

Comment by RobertM (T3t) on Speaking to Congressional staffers about AI risk · 2024-02-28T19:28:25.671Z · LW · GW

Curated.  I liked that this post had a lot of object-level detail about a process that is usually opaque to outsiders, and that the "Lessons Learned" section was also grounded enough that someone reading this post might actually be able to skip "learning from experience", at least for a few possible issues that might come up if one tried to do this sort of thing.

Comment by RobertM (T3t) on Less Wrong automated systems are inadvertently Censoring me · 2024-02-21T19:33:58.331Z · LW · GW

(We check for "downvoter count within window", not all-time.)

Comment by RobertM (T3t) on And All the Shoggoths Merely Players · 2024-02-15T04:42:13.413Z · LW · GW

Curated.  This dialogue distilled a decent number of points I consider cruxes between these two (clusters of) positions.  I also appreciated the substantial number of references linking back to central and generally high-quality examples of each argument being made; I think this is especially helpful when writing a dialogue meant to represent positions people actually hold.

I look forward to the next installment.

Comment by RobertM (T3t) on MIRI 2024 Mission and Strategy Update · 2024-01-08T04:19:46.438Z · LW · GW

Here's the editor guide section for spoilers.  (Note that I tested the instructions for markdown, and that does indeed seem broken in a weird way; the WYSIWYG spoilers still work normally but only support "block" spoilers; you can't do it for partial bits of lines.)

In this case I think a warning at the top of the comment is sufficient, given the context of the rest of the thread, so up to whether you want to try to reformat your comment around our technical limitations.

Comment by T3t on [deleted post] 2024-01-08T04:16:02.865Z

Foobar! :::spoiler This text would be covered by a spoiler block. ::: test more stuff on the same line.

Comment by RobertM (T3t) on OpenAI, DeepMind, Anthropic, etc. should shut down. · 2023-12-19T08:26:59.324Z · LW · GW

if they all shut down because their employees all quit.

Comment by RobertM (T3t) on OpenAI, DeepMind, Anthropic, etc. should shut down. · 2023-12-19T06:25:21.611Z · LW · GW

I think that to the extent that other labs are "not far behind" (such as FAIR), this is substantially an artifact of them being caught up in a competitive arms race.  Catching up to "nearly SOTA" is usually much easier than "advancing SOTA", and I'm fairly persuaded by the argument that the top 3 labs are indeed ideologically motivated in ways that most other labs aren't, and there would be much less progress in dangerous directions if they all shut down because their employees all quit.

Comment by RobertM (T3t) on How do you feel about LessWrong these days? [Open feedback thread] · 2023-12-08T06:10:31.466Z · LW · GW

I mean, it was published between 2010 and 2015, and it's extremely rare for fanfiction (or other kinds of online serial fiction) to be more popular after completion than while it's in progress.  I followed it while it was in progress, and am in fact one of those people who found LessWrong through it.  There was definitely an observable "wave" of popularity in both my in-person and online circles (which were not, at the time, connected to the rationality community at all); I think it probably peaked in 2012 or 2013.

Comment by RobertM (T3t) on How do you feel about LessWrong these days? [Open feedback thread] · 2023-12-08T02:30:43.918Z · LW · GW

I suspect a lot of problems came from the influx of HPMOR and FiO readers into LW without any other grounding point. Niplav thankfully avoided this wave, and I buy that the empirical circles like Ryan Greenblatt's social circle isn't relying on fiction, but I'm worried that a lot of non-experts are so bad epistemically speaking that they ended up essentially writing fanfic on AI doom, and forget to check whether the assumptions actually hold in reality.

This seems unlikely to me, since HPMOR peaked in popularity nearly a decade ago.

Comment by RobertM (T3t) on How do you feel about LessWrong these days? [Open feedback thread] · 2023-12-06T23:01:25.789Z · LW · GW

If you get value out of it, we're happy for dialogues to be used that way, as long as it's clear to all participants what the expectations re: publishing/not publishing are, so that nobody has an upleasant surprise at the end of the day.  (Dialogues currently allow any participant to unilaterally publish, since most other options we could think imposed a lot of friction on publishing.)

Comment by RobertM (T3t) on How do you feel about LessWrong these days? [Open feedback thread] · 2023-12-06T22:58:26.383Z · LW · GW

We also have a system which automatically applies "core tags" (AI, Rationality, World Modeling, World Optimization, Community, and Practical) to new posts.  It's accurate enough, particulary with the AI tag, that it enables the use-case of "filter out all AI posts from the homepage", which a non-zero number of users want, even if we still need to sometimes fix the tags applied to posts.

Comment by RobertM (T3t) on How do you feel about LessWrong these days? [Open feedback thread] · 2023-12-06T22:49:50.753Z · LW · GW

Intercom has the benefit of acting as an inbox on our side, unlike comments posted on LW (which may not be seen by any LW team member).

In an ideal world, would Github Issues be better for tracking bug reports?  Probably, yes.  But Github Issues require that the user reporting an issue navigate to a different page and have a Github account, which approximately makes it a non-starter as the top-of-funnel.

Intercom's message re: response times has some limited configurability but it's difficult to make it say exactly the right thing here.  Triaging bug reports from Intercom messages is a standard part of our daily workflow,so you shouldn't model yourself as imposing unusual costs on the team by reporting bugs through Intercom.

re: reliability - yep, we are not totally reliable here.  There are probably relatively easy process improvements here that we will end up not implementing because figuring out & implementing such process improvements takes time, which means it's competing with everything else we might decide to spend time on.  Nevertheless I'm sorry about the variety of dropped balls; it's possible we will try to improve something here.

re: issue tracker - right now our process is approximately "toss bugs into a dedicated slack channel, shared with the EA forum".  The EA forum has a more developed issue-tracking process, so some of those do find their way to Github Issues (eventually).

Comment by RobertM (T3t) on How do you feel about LessWrong these days? [Open feedback thread] · 2023-12-06T05:43:12.857Z · LW · GW

Just as an FYI: pinging us on Intercom is a much more reliable way of ensuring we see feature suggestions or bug reports than posting comments.  Most feature suggestions won't be implemented[1]; bug reports are prioritized according to urgency/impact and don't always rise to the level of "will be addressed" (though I think >50% do).

  1. ^

    At least not as a result of a single person suggesting them; we have ever made decisions that were influenced on the margin by suggestions from one or more LW users.

Comment by RobertM (T3t) on RobertM's Shortform · 2023-12-03T21:35:23.087Z · LW · GW

Not all of these are NDAs; my understanding is that the OpenPhil request comes along with the news of the grant (and isn't a contract).  Really my original shortform should've been a broader point about confidentiality/secrecy norms, but...

Comment by RobertM (T3t) on RobertM's Shortform · 2023-12-03T09:08:27.076Z · LW · GW

I have more examples, but unfortunately some of them I can't talk about.  A few random things that come to mind:

  • OpenPhil routinely requests that grantees not disclose that they've received an OpenPhil grant until OpenPhil publishes it themselves, which usually happens many months after the grant is disbursed.
  • Nearly every instance that I know of where EA leadership refused to comment on anything publicly post-FTX due to advice from legal counsel.
  • So many things about the Nonlinear situation.
  • Coordination Forum requiring attendees agree to confidentiality re: attendance and content of any conversations with people who wanted to attend but not have their attendance known to the wider world, like SBF, and also people in the AI policy space.
Comment by RobertM (T3t) on RobertM's Shortform · 2023-12-03T05:25:58.881Z · LW · GW

As a recent example, from this article on the recent OpenAI kerfufle:

Two people familiar with the board’s thinking say that the members felt bound to silence by confidentiality constraints.

Comment by RobertM (T3t) on MATS Summer 2023 Retrospective · 2023-12-02T08:12:37.019Z · LW · GW

I have observed the term "postmortem" to suffer from some semantic slippage, and is sometimes used to mean something similar to "retrospective".

Comment by RobertM (T3t) on RobertM's Shortform · 2023-12-02T05:41:17.124Z · LW · GW

NDAs sure do seem extremely costly.  My current sense is that it's almost never worth signing one, or binding oneself to confidentiality in any similar way, for anything except narrowly-scoped technical domains (such as capabilities research).

Comment by RobertM (T3t) on AI #40: A Vision from Vitalik · 2023-12-01T03:33:44.166Z · LW · GW

I think Schmidhuber does in fact think that humans will go extinct as a result of developing ASI: https://www.lesswrong.com/posts/BEtQALqgXmL9d9SfE/q-and-a-with-juergen-schmidhuber-on-risks-from-ai

Comment by RobertM (T3t) on Cis fragility · 2023-11-30T07:12:16.251Z · LW · GW

This is subtweeting the fact that (iirc) the OP wrote a comment sort of similar to the one described, on Duncan's most recent post.  I think it is much less surprising that such a comment got downvoted on LW (vs. the EA forum), and am not at all surprised that Duncan then deleted the comment and banned the OP from comment on their posts, after the comment spawned a (probably-annoying-to-Duncan) longer comment thread.

Comment by RobertM (T3t) on OpenAI: The Battle of the Board · 2023-11-23T05:42:26.604Z · LW · GW

On the question of aptitude for science, Summers said this: "It does appear that on many, many different human attributes -- height, weight, propensity for criminality, overall IQ, mathematical ability, scientific ability -- there is relatively clear evidence that whatever the difference in means -- which can be debated -- there is a difference in the standard deviation, and variability of a male and a female population. And that is true with respect to attributes that are and are not plausibly, culturally determined. If one supposes, as I think is reasonable, that if one is talking about physicists at a top 25 research university, one is not talking about people who are two standard deviations above the mean. And perhaps it's not even talking about somebody who is three standard deviations above the mean. But it's talking about people who are three and a half, four standard deviations above the mean in the one in 5,000, one in 10,000 class. Even small differences in the standard deviation will translate into very large differences in the available pool substantially out."

Source (the article linked to in the axios article you cite)

Putting that aside, I don't see what part of Zvi's claim you think is taking things further than they deserve to be taken.  It seems indisputably true that Summers is both a well-known bullet-biter[1], and also that he has had some associations with Effective Altruism[2].  That is approximately the extent of Zvi's claims re: Summers.  I don't see anything that Zvi wrote as implying some sort of broader endorsement of Summers' character or judgment.

  1. ^

    See above.

  2. ^

    https://harvardundergradea.org/podcast/2018/5/19/larry-summers-on-his-career-lessons-and-effective-altruism

Comment by RobertM (T3t) on Sam Altman's ouster at OpenAI was precipitated by letter to board about AI breakthrough - Reuters · 2023-11-23T00:51:55.697Z · LW · GW

Yep, thanks, corrected.  (I think that still could possibly imply the consequent, but it's a much weaker connection.)

Comment by RobertM (T3t) on Sam Altman's ouster at OpenAI was precipitated by letter to board about AI breakthrough - Reuters · 2023-11-22T23:51:18.128Z · LW · GW

At the same time, there are claims that Emmett couldn't get the board to explain their reasoning to him get written documentation of the board's reasoning, so if those claims are true, then "nothing to do with a specific safety issue" might just be a claim from the board by proxy.

Comment by RobertM (T3t) on Redirecting one’s own taxes as an effective altruism method · 2023-11-14T03:13:40.155Z · LW · GW

I think this post neglects one of the most serious risks: that adopting a strategy is correlated decision across agents, that others will correctly see that happening, and that the downside risk is significantly magnified by those dynamics.

Naive 1st-order utilitarianism gives you the wrong answer here.  Do not illegally skip on paying your income taxes to donate to charity.  Spend your cognitive resources getting a better job, or otherwise legally optimizing for more income.  Being a software engineer at many tech companies will enable you to donate six figures per year while maintaining a comfortable lifestyle, without restricting your ability to engage with financial infrastructure, own property, or travel.

Comment by RobertM (T3t) on RobertM's Shortform · 2023-11-14T02:56:54.970Z · LW · GW

I am pretty concerned that most of the public discussion about risk from e.g. the practice of open sourcing frontier models is focused on misuse risk (particular biorisk).  Misuse risk seems like it could be a real thing, but it's not where I see most of the negative EV, when it comes to open sourcing frontier models.  I also suspect that many people doing comms work which focuses on misuse risk are focusing on misuse risk in ways that are strongly disproportionate to how much of the negative EV they see coming from it, relative to all sources.

I think someone should write a summary post covering "why open-sourcing frontier models and AI capabilities more generally is -EV".  Key points to hit:

  • (1st order) directly accelerating capabilities research progress
  • (1st order) we haven't totally ruled out the possibility of hitting "sufficiently capable systems" which are at least possible in principle to use in +EV ways, but which if made public would immediately have someone point them at improving themselves and then we die.  (In fact, this is very approximately the mainline alignment plan of all 3 major AGI orgs.)
  • (2nd order) generic "draws in more money, more attention, more skilled talent, etc" which seems like it burns timelines

And, sure, misuse risks (which in practice might end up being a subset of the second bullet point, but not necessarily so).  But in reality, LLM-based misuse risks probably don't end up being x-risks, unless biology turns out to be so shockingly easy that a (relatively) dumb system can come up with something that gets ~everyone in one go.

Comment by T3t on [deleted post] 2023-11-12T23:44:40.236Z

And I'm confused why you're simultaneously complaining about lack of community epistemic rigor, but then also criticize Scott's time spent on research. Don't those considerations point in opposite directions?

Well, not necessarily - the judgment re: lack of epistemic rigor could be coming from having decided that there's an obvious right answer and observing everybody else arriving at the wrong answer, not from a lack of research effort that preceded arriving at the wrong answer.

ETA: I do currently think[1] that kidney donation is probably more appropriately bucketed as "buying fuzzies" rather than "buying utilons" for most people in the relevant reference class, but I can imagine a set of beliefs & circumstances that tip it into the other bucket.

  1. ^

    Not a position I've spent a lot of time thinking about.  Maybe an hour?

Comment by RobertM (T3t) on AI #37: Moving Too Fast · 2023-11-11T06:00:42.492Z · LW · GW

whether or not this is an accurate description of Hobart’s views or his talk

Hobart (and Askonas) respond in Sergey's comment section, if anyone wants more detail.

Comment by RobertM (T3t) on RobertM's Shortform · 2023-11-11T04:03:52.093Z · LW · GW

Reducing costs equally across the board in some domain is bad news in any situation where offense is favored. Reducing costs equally-in-expectation (but unpredictably, with high variance) can be bad even if offense isn't favored, since you might get unlucky and the payoffs aren't symmetrical.

(re: recent discourse on bio risks from misuse of future AI systems.  I don't know that I think those risks are particularly likely to materialize, and most of my expected disutility from AI progress doesn't come from that direction, but I've seen a bunch of arguments that seem to be skipping some steps when trying to argue that progress on ability to do generic biotech is positive EV.  To be fair, the arguments for why we should expect it to be negative EV are often also skipping those steps.  My point is that a convincing argument in either direction needs to justify its conclusion in more depth; the heuristics I reference above aren't strong enough to carry the argument.)

Comment by RobertM (T3t) on My thoughts on the social response to AI risk · 2023-11-02T17:50:32.418Z · LW · GW

Huh, I think that's probably a bug.  Will look into it.

Comment by RobertM (T3t) on We're Not Ready: thoughts on "pausing" and responsible scaling policies · 2023-10-28T03:22:39.328Z · LW · GW

You're aware that there's only one public RSP?

Comment by RobertM (T3t) on Alignment Implications of LLM Successes: a Debate in One Act · 2023-10-27T05:12:51.278Z · LW · GW

Of course very few "doomers" think that current LLMs behave in ways that we parse as "nice" because they have a "hidden, instrumental motive for being nice" (in the sense that I expect you meant that).  Current LLMs likely aren't coherent & self-aware enough to have such hidden, instrumental motives at all.

Comment by RobertM (T3t) on Evolution Solved Alignment (what sharp left turn?) · 2023-10-17T07:24:32.975Z · LW · GW

The broader laws of natural selection and survival of the fittest

These are distinct from biological evolution, so if our descendants end up being optimized over by them rather than by biological evolution, that feels like it's conceding the argument (that we'll have achieved escape velocity).

Comment by RobertM (T3t) on Dishonorable Gossip and Going Crazy · 2023-10-15T05:34:16.533Z · LW · GW

"I'm worried that if we let someone go off and try something different, they will suddenly become way less open to changing their mind, and be dead set on thinking they've found the One True Way" seems like something weird to be worried about.

This both seems like a totally reasonable concern to have, and also missing many of the concerning elements of the thing it's purportedly summarizing, like, you know, suddenly having totally nonsensical beliefs about the world.

Comment by RobertM (T3t) on Evolution Solved Alignment (what sharp left turn?) · 2023-10-14T03:23:15.223Z · LW · GW

"The training loop is still running" is not really a counterargument to "the training loop accidentally spat out something smarter than the training loop, that wants stuff other than the stuff the training loop is optimizing for, that iterates at a much faster speed than the training, and looks like it's just about to achieve total escape velocity".

Comment by RobertM (T3t) on The commenting restrictions on LessWrong seem bad · 2023-09-16T21:02:32.069Z · LW · GW

It seems like the major crux here is whether we think that debates over claim and counter-claim (basically, other cruxes) are likely to be useful or likely to cause harm. It seems from talking to the mods here and reading a few of their comments on this topic that they tend to learn towards them being harmful on average and thus need to be pushed down a bit.

This is, as far as I can tell, totally false.  There is a very different claim one could make which at least more accurately represents my opinion, i.e. see this comment by John Wentworth (who is not a mod).

Most of your comment seems to be an appeal to modest epistemology.  We can in fact do better than total agnosticism about whether some arguments are productive or not, and worth having more or less of on the margin.

Comment by RobertM (T3t) on Highlights: Wentworth, Shah, and Murphy on "Retargeting the Search" · 2023-09-15T21:17:38.132Z · LW · GW

John mentioned the existince of What's General-Purpose Search, And Why Might We Expect To See It In Trained ML Systems?, which was something of a follow-up post to How To Go From Interpretability To Alignment: Just Retarget The Search, and continues in a similar direction.

Comment by RobertM (T3t) on AI #27: Portents of Gemini · 2023-09-01T23:52:08.689Z · LW · GW

evhub was at 80% about a year ago (currently at Anthropic, interned at OpenAI).

Daniel Kokotajlo was at 65% ~2 years ago; I think that number's gone up since then.

Quite a few other people at Anthropic also have pessimistic views, according to Chris Olah:

I wouldn't want to give an "official organizational probability distribution", but I think collectively we average out to something closer to "a uniform prior over possibilities" without that much evidence thus far updating us from there. Basically, there are plausible stories and intuitions pointing in lots of directions, and no real empirical evidence which bears on it thus far.

(Obviously, within the company, there's a wide range of views. Some people are very pessimistic. Others are optimistic. We debate this quite a bit internally, and I think that's really positive! But I think there's a broad consensus to take the entire range seriously, including the very pessimistic ones.)

The Deepmind alignment team probably has at least a couple people who think the odds are bad, (p(doom) > 50%) given the way Vika buckets the team, combined with the distribution of views reflected by DeepMind alignment team opinions on AGI ruin arguments.

Some corrections for your overall description of the DM alignment team:

  • I would count ~20-25 FTE on the alignment + scalable alignment teams (this does not include the AGI strategy & governance team)
  • I would put DM alignment in the "fairly hard" bucket (p(doom) = 10-50%) for alignment difficulty, and the "mixed" bucket for "conceptual vs applied"
Comment by RobertM (T3t) on AI #27: Portents of Gemini · 2023-09-01T18:04:13.932Z · LW · GW

Please refer back to your original claim:

no one in the industry seems to think they are writing their own death warrant

Comment by RobertM (T3t) on AI #27: Portents of Gemini · 2023-08-31T23:00:04.646Z · LW · GW

no one in the industry seems to think they are writing their own death warrant

 

What led you to believe this? Plenty of people working at the top labs have very high p(doom) (>80%).  Several of them comment on LessWrong.  We have a survey of the broader industry as well.  Even the people running the top 3 labs (Sam Altman, Dario Amodei, and Demis Hassabis) all think it's likely enough that it's worth dedicating a significant percentage of their organizational resources to researching alignment.

Comment by RobertM (T3t) on Eliezer Yudkowsky Is Frequently, Confidently, Egregiously Wrong · 2023-08-31T06:03:51.246Z · LW · GW

Most "Bayesians" are deceiving themselves about how much they are using it.

This is a frequently-made accusation which has very little basis in reality.  The world is a big place, so you will be able to find some examples of such people, but central examples of LessWrong readers, rationalists, etc, are not going around claiming that they run their entire lives on explicit Bayes.

Comment by RobertM (T3t) on Eliezer Yudkowsky Is Frequently, Confidently, Egregiously Wrong · 2023-08-31T05:59:49.553Z · LW · GW

Most (but not all) automatic rate limits allow authors to continue to comment on their own posts, since in many such cases it does indeed seem likely that preventing that would be counterproductive.

Comment by RobertM (T3t) on My current LK99 questions · 2023-08-13T04:07:57.741Z · LW · GW

Curated.

Although the LK-99 excitement has cooled off, this post stands as an excellent demonstration of why and how Bayesian reasoning is helpful: when faced with surprising or confusing phenomena, understanding how to partition your model of reality such that new evidence would provide the largest updates, is quite valuable.  Even if the questions you construct are themselves confused or based on invalid premises, they're often confused in a much more legible way, such that domain experts can do a much better job of pointing to that and saying something like "actually, there's a third alternative", or "A wouldn't imply B in any situation, so this provides no evidence".

Comment by RobertM (T3t) on Yann LeCun on AGI and AI Safety · 2023-08-09T19:25:07.329Z · LW · GW

This seems like an epistemically dangerous way of describing the situation that "These people think that AI x-risk arguments are incorrect, and are willing to argue for that position".

I don't think the comment you're responding to is doing this; I think it's straightforwardly accusing LeCun and Andreesen of conducting an infowar against AI safety.  It also doesn't claim that they don't believe their own arguments.

Now, the "deliberate infowar in service of accelerationism" framing seems mostly wrong to me (at least with respect to LeCun; I wouldn't be surprised if there was a bit of that going on elsewhere), but sometimes that is a thing that happens and we need to be able to discuss whether that's happening in any given instance.  re: your point about tribalism, this does carry risks of various kinds of motivated cognition, but the correct answer is not to cordon off a section of reality and declare it off-limits for discussion.

Comment by RobertM (T3t) on Problems with Robin Hanson's Quillette Article On AI · 2023-08-09T01:08:22.430Z · LW · GW

So "being happy" or "being a utility-maximizer" will probably end up being a terminal goal, because those are unlikely to conflict with any other goals. 

"Being unlikely to conflict with other values" is not at the core of what characterizes the difference between instrumental and terminal values.

If you're talking about goals related purely to the state of the external world, not related to the agent's own inner-workings or its own utility function, why do you think it would still want to keep its goals immutable with respect to just the external world?

Putting aside the fact that agents are embedded in the environment, and that values which reference the agent's internals are usually not meaningfully different from values which reference things external to the agent... can you describe what kinds of values that reference the external world are best satisfied by those same values being changed?