LessWrong 2.0 Reader

View: New · Old · Top

next page (older posts) →

A few thoughts on my self-study for alignment research
Thomas Kehrenberg (thomas-kehrenberg) · 2022-12-30T22:05:58.859Z · comments (0)

Christmas Microscopy
jefftk (jkaufman) · 2022-12-30T21:10:01.937Z · comments (0)

What "upside" of AI?
False Name (False Name, Esq.) · 2022-12-30T20:58:49.165Z · comments (5)

Evidence on recursive self-improvement from current ML
beren · 2022-12-30T20:53:22.462Z · comments (12)

[question] Is ChatGPT TAI?
Amal (asta-vista) · 2022-12-30T19:44:50.508Z · answers+comments (5)

My thoughts on OpenAI's alignment plan
Akash (akash-wasil) · 2022-12-30T19:33:15.019Z · comments (3)

Beyond Rewards and Values: A Non-dualistic Approach to Universal Intelligence
Akira Pyinya · 2022-12-30T19:05:24.664Z · comments (4)

10 Years of LessWrong
JohnBuridan · 2022-12-30T17:15:17.498Z · comments (2)

Chatbots as a Publication Format
derek shiller (derek-shiller) · 2022-12-30T14:11:21.015Z · comments (6)

Human sexuality as an interesting case study of alignment
beren · 2022-12-30T13:37:20.176Z · comments (26)

The Twitter Files: Covid Edition
Zvi · 2022-12-30T13:30:01.073Z · comments (2)

Worldly Positions archive, briefly with private drafts
KatjaGrace · 2022-12-30T12:20:05.430Z · comments (0)

Models Don't "Get Reward"
Sam Ringer · 2022-12-30T10:37:11.798Z · comments (61)

The hyperfinite timeline
Alok Singh (OldManNick) · 2022-12-30T09:30:06.483Z · comments (6)

Reactive devaluation: Bias in Evaluating AGI X-Risks
Remmelt (remmelt-ellen) · 2022-12-30T09:02:58.450Z · comments (9)

Things I carry almost every day, as of late December 2022
DanielFilan · 2022-12-30T07:40:01.261Z · comments (9)

More ways to spot abysses
KatjaGrace · 2022-12-30T06:30:06.301Z · comments (1)

Language models are nearly AGIs but we don't notice it because we keep shifting the bar
philosophybear · 2022-12-30T05:15:15.625Z · comments (13)

[link] Progress links and tweets, 2022-12-29
jasoncrawford · 2022-12-30T04:54:51.905Z · comments (0)

Announcing The Filan Cabinet
DanielFilan · 2022-12-30T03:10:00.494Z · comments (2)

[question] Effective Evil Causes?
Ulisse Mini (ulisse-mini) · 2022-12-30T02:56:31.459Z · answers+comments (2)

But is it really in Rome? An investigation of the ROME model editing technique
jacquesthibs (jacques-thibodeau) · 2022-12-30T02:40:36.713Z · comments (1)

A Year of AI Increasing AI Progress
ThomasW (ThomasWoodside) · 2022-12-30T02:09:39.458Z · comments (3)

Why not spend more time looking at human alignment?
ajc586 (Adrian Cable) · 2022-12-30T00:22:13.666Z · comments (3)

Why and how to write things on the Internet
benkuhn · 2022-12-29T22:40:04.636Z · comments (2)

[link] Friendly and Unfriendly AGI are Indistinguishable
ErgoEcho · 2022-12-29T22:13:00.434Z · comments (4)

200 COP in MI: Looking for Circuits in the Wild
Neel Nanda (neel-nanda-1) · 2022-12-29T20:59:53.267Z · comments (5)

Thoughts on the implications of GPT-3, two years ago and NOW [here be dragons, we're swimming, flying and talking with them]
Bill Benzon (bill-benzon) · 2022-12-29T20:05:31.062Z · comments (0)

Covid 12/29/22: Next Up is XBB.1.5
Zvi · 2022-12-29T18:20:00.943Z · comments (4)

Entrepreneurship ETG Might Be Better Than 80k Thought
Xodarap · 2022-12-29T17:51:13.412Z · comments (0)

Internal Interfaces Are a High-Priority Interpretability Target
Thane Ruthenis · 2022-12-29T17:49:27.450Z · comments (6)

CFP for Rebellion and Disobedience in AI workshop
Ram Rachum (ram@rachum.com) · 2022-12-29T16:08:05.035Z · comments (0)

My scorched-earth policy on New Year’s resolutions
PatrickDFarley · 2022-12-29T14:45:47.126Z · comments (2)

Don't feed the void. She is fat enough!
Johannes C. Mayer (johannes-c-mayer) · 2022-12-29T14:18:44.526Z · comments (0)

[question] Is there any unified resource on Eliezer's fatigue?
Johannes C. Mayer (johannes-c-mayer) · 2022-12-29T14:04:53.488Z · answers+comments (2)

Logical Probability of Goldbach’s Conjecture: Provable Rule or Coincidence?
avturchin · 2022-12-29T13:37:45.130Z · comments (15)

Where do you get your capabilities from?
tailcalled · 2022-12-29T11:39:05.449Z · comments (27)

[link] The commercial incentive to intentionally train AI to deceive us
Derek M. Jones (Derek-Jones) · 2022-12-29T11:30:28.267Z · comments (1)

Infinite necklace: the line as a circle
Alok Singh (OldManNick) · 2022-12-29T10:41:58.268Z · comments (2)

Privacy Tradeoffs
jefftk (jkaufman) · 2022-12-29T03:40:01.463Z · comments (1)

Against John Searle, Gary Marcus, the Chinese Room thought experiment and its world
philosophybear · 2022-12-29T03:26:12.485Z · comments (43)

Large Language Models Suggest a Path to Ems
anithite (obserience) · 2022-12-29T02:20:01.753Z · comments (2)

[question] Book recommendations for the history of ML?
Eleni Angelou (ea-1) · 2022-12-28T23:50:55.512Z · answers+comments (2)

Rock-Paper-Scissors Can Be Weird
winwonce · 2022-12-28T23:12:11.329Z · comments (3)

200 COP in MI: The Case for Analysing Toy Language Models
Neel Nanda (neel-nanda-1) · 2022-12-28T21:07:03.838Z · comments (3)

200 Concrete Open Problems in Mechanistic Interpretability: Introduction
Neel Nanda (neel-nanda-1) · 2022-12-28T21:06:53.853Z · comments (0)

Effective ways to find love?
anonymoususer · 2022-12-28T20:46:23.247Z · comments (7)

Classical logic based on propositions-as-subsingleton-types
Thomas Kehrenberg (thomas-kehrenberg) · 2022-12-28T20:16:37.723Z · comments (0)

In Defense of Wrapper-Minds
Thane Ruthenis · 2022-12-28T18:28:25.868Z · comments (38)

[question] What is the best way to approach Expected Value calculations when payoffs are highly skewed?
jmh · 2022-12-28T14:42:51.169Z · answers+comments (16)

next page (older posts) →

Archive

Recent comments

owencb on Essay competition on the Automation of Wisdom and Philosophy — $25k in prizes

Multiple entries are very welcome!

[With some kind of anti-munchkin caveat. Submitting your analyses of several different disjoint questions seems great; submitting two versions of largely the same basic content in different styles not so great. I'm not sure exactly how we'd handle it if someone did the latter, but we'd aim for something sensible that didn't incentivise people to have been silly about it.]

lc on Reconsider the anti-cavity bacteria if you are Asian

I'm white.

ryan_b on Text Posts from the Kids Group: 2020

This is great, bookmarked for future warm and fuzzies. I've just had my second, a son, on February 8th. My first, a daughter, is six next week.

Let it be known to all and sundry that kids are fantastic and fatherhood is wondrous. It is much work and a high cost in money and sleep, in exchange for which you are endowed with glorious purpose and wireheaded to the future.

Also there is the love. Strongly recommended.

zach-stein-perlman on Anthropic AI made the right call

I guess I'm more willing to treat Anthropic's marketing as not-representing-Anthropic.

Like, when OpenAI marketing says GPT-4 is our most aligned model yet! you could say this shows that OpenAI deeply misunderstands what we mean by alignment but I tend to ignore it. Even mostly when Sam Altman says it himself.

akash-wasil on Paul Christiano named as US AI Safety Institute Head of AI Safety

I'm excited to see how Paul performs in the new role. He's obviously very qualified on a technical level, and I suspect he's one of the best people for the job of designing and conducting evals.

I'm more uncertain about the kind of influence he'll have on various AI policy and AI national security discussions. And I mean uncertain in the genuine "this could go so many different ways" kind of way.

Like, it wouldn't be particularly surprising to me if any of the following occurred:

Paul focuses nearly all of his efforts on technical evals and doesn't get very involved in broader policy conversations
Paul is regularly asked to contribute to broader policy discussions, and he advocates for RSPs and other forms of voluntary commitments.
Paul is regularly asked to contribute to broader policy discussions, and he advocates for requirements that go beyond voluntary commitments and are much more ambitious than what he advocated for when he was at ARC.
Paul is regularly asked to contribute to broader policy discussions, and he's not very good at communicating his beliefs in ways that are clear/concise/policymaker-friendly, so his influence on policy discussions is rather limited.
Paul [is/isn't] able to work well with others who have very different worldviews and priorities.

Personally, I see this as a very exciting opportunity for Paul to form an identity as a leader in AI policy. I'm guessing the technical work will be his priority (and indeed, it's what he's being explicitly hired to do), but I hope he also finds ways to just generally improve the US government's understanding of AI risk and the likelihood of implementing reasonable policies. On the flipside, I hope he doesn't settle for voluntary commitments (especially as the Overton Window shifts) & I hope he's clear/open about the limitations of RSPs.

More specifically, I hope he's able to help policymakers reason about a critical question: what do we do after we've identified models with (certain kinds of) dangerous capabilities? I think the underlying logic behind RSPs could actually be somewhat meaningfully applied to USG policy. Like, I think we would be in a safer world if the USG had an internal understanding of ASL levels, took seriously the possibility of various dangerous capabilities thresholds being crossed, took seriously the idea that AGI/ASI could be developed soon, and had preparedness plans in place that allowed them to react quickly in the event of a sudden risk.

Anyways, a big congratulations to Paul, and definitely some evidence that the USAISI is capable of hiring some technical powerhouses.

davekasten on What convincing warning shot could help prevent extinction from AI?

As you know from our conversations, I'm largely in the same camp as you on this point.

But one point I'd make incrementally is this: USG folks are also concerned about warning shots of the nature, "The President's Daily Brief ran an article 6 months ago saying warning signs for dangerous thing X would be events W, Y, and Z, and today the PDB had an article saying our intelligence agencies assess that Y and Z have happened due to super secret stuff".

If rationalists want rationalist warning shots to be included, they need to convince relevant government analytic stakeholders of their relevance.

william-howard on My experience using financial commitments to overcome akrasia

That was when I took a week off work to do side projects, glad it at least shows up in this graph 😌

william-howard on My experience using financial commitments to overcome akrasia

Thanks for the tip... deep down a part of me knows there are ways to get around Freedom, but they're non-obvious enough that I haven't cheated yet

akash-wasil on What convincing warning shot could help prevent extinction from AI?

I'll admit I have only been loosely following the control stuff, but FWIW I would be excited about a potential @peterbarnett [LW · GW] & @ryan_greenblatt [LW · GW] dialogue in which you two to try to identify & analyze any potential disagreements. Example questions:

What is the most capable system that you think we are likely to be able to control?
What kind of value do you think we could get out of such a system?
To what extent do you expect that system to be able to produce insights that help us escape the acute risk period (i.e., get out of a scenario where someone else can come along and build a catastrophe-capable system without implementing control procedures or someone else comes along and scales to the point where the control procedures are no longer sufficient)

nathan-helm-burger on Paul Christiano named as US AI Safety Institute Head of AI Safety

Whoohoo!