LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Video and transcript of presentation on Otherness and control in the age of AGI
Joe Carlsmith (joekc) · 2024-10-08T22:30:38.054Z · comments (1)

Monthly Roundup #22: September 2024
Zvi · 2024-09-17T12:20:08.297Z · comments (10)

Proposal for improving the global online discourse through personalised comment ordering on all websites
Roman Leventov · 2023-12-06T18:51:37.645Z · comments (21)

[link] On Fables and Nuanced Charts
Niko_McCarty (niko-2) · 2024-09-08T17:09:07.503Z · comments (2)

List of strategies for mitigating deceptive alignment
joshc (joshua-clymer) · 2023-12-02T05:56:50.867Z · comments (2)

[link] My article in The Nation — California’s AI Safety Bill Is a Mask-Off Moment for the Industry
garrison · 2024-08-15T19:25:59.592Z · comments (0)

Open Thread – Winter 2023/2024
habryka (habryka4) · 2023-12-04T22:59:49.957Z · comments (160)

Categories of leadership on technical teams
benkuhn · 2024-07-22T04:50:04.071Z · comments (0)

[link] Twitter thread on politics of AI safety
Richard_Ngo (ricraz) · 2024-07-31T00:00:34.298Z · comments (2)

Agency in Politics
Martin Sustrik (sustrik) · 2024-07-17T05:30:01.873Z · comments (2)

Live Machinery: An Interface Design Philosophy for Wholesome AI Futures (Workshop @ EA Hotel!)
Sahil · 2024-11-01T17:24:09.957Z · comments (2)

My Detailed Notes & Commentary from Secular Solstice
Jeffrey Heninger (jeffrey-heninger) · 2024-03-23T18:48:51.894Z · comments (16)

An Introduction to Representation Engineering - an activation-based paradigm for controlling LLMs
Jan Wehner · 2024-07-14T10:37:21.544Z · comments (5)

Dangers of Closed-Loop AI
Gordon Seidoh Worley (gworley) · 2024-03-22T23:52:22.010Z · comments (9)

How I select alignment research projects
Ethan Perez (ethan-perez) · 2024-04-10T04:33:08.092Z · comments (4)

Index of rationalist groups in the Bay Area July 2024
Lucie Philippon (lucie-philippon) · 2024-07-26T16:32:25.337Z · comments (10)

Open consultancy: Letting untrusted AIs choose what answer to argue for
Fabien Roger (Fabien) · 2024-03-12T20:38:03.785Z · comments (5)

Empirical vs. Mathematical Joints of Nature
Elizabeth (pktechgirl) · 2024-06-26T01:55:22.858Z · comments (1)

Economics Roundup #2
Zvi · 2024-07-02T12:40:05.908Z · comments (5)

Representation Tuning
Christopher Ackerman (christopher-ackerman) · 2024-06-27T17:44:33.338Z · comments (9)

'Theories of Values' and 'Theories of Agents': confusions, musings and desiderata
Mateusz Bagiński (mateusz-baginski) · 2023-11-15T16:00:48.926Z · comments (8)

Forecasting AI (Overview)
jsteinhardt · 2023-11-16T19:00:04.218Z · comments (0)

[link] AISN #25: White House Executive Order on AI, UK AI Safety Summit, and Progress on Voluntary Evaluations of AI Risks
aogara (Aidan O'Gara) · 2023-10-31T19:34:54.837Z · comments (1)

A sketch of acausal trade in practice
Richard_Ngo (ricraz) · 2024-02-04T00:32:54.622Z · comments (4)

[question] What is an "anti-Occamian prior"?
Zane · 2023-10-23T02:26:10.851Z · answers+comments (22)

[link] OpenAI appoints Retired U.S. Army General Paul M. Nakasone to Board of Directors
Joel Burget (joel-burget) · 2024-06-13T21:28:18.110Z · comments (10)

Doomsday Argument and the False Dilemma of Anthropic Reasoning
Ape in the coat · 2024-07-05T05:38:39.428Z · comments (55)

[link] Hyperreals in a Nutshell
Yudhister Kumar (randomwalks) · 2023-10-15T14:23:58.027Z · comments (27)

How predictive processing solved my wrist pain
max_shen (makoshen) · 2024-07-04T01:56:20.162Z · comments (8)

Humans aren't fleeb.
Charlie Steiner · 2024-01-24T05:31:46.929Z · comments (5)

An explanation for every token: using an LLM to sample another LLM
Max H (Maxc) · 2023-10-11T00:53:55.249Z · comments (5)

[link] List of Collective Intelligence Projects
Chipmonk · 2024-07-02T14:10:41.789Z · comments (9)

[link] Robin Hanson & Liron Shapira Debate AI X-Risk
Liron · 2024-07-08T21:45:40.609Z · comments (4)

CHAI internship applications are open (due Nov 13)
Erik Jenner (ejenner) · 2023-10-26T00:53:49.640Z · comments (0)

Direction of Fit
NicholasKees (nick_kees) · 2023-10-02T12:34:24.385Z · comments (0)

[link] math terminology as convolution
bhauth · 2023-10-30T01:05:11.823Z · comments (1)

[link] AI governance needs a theory of victory
Corin Katzke (corin-katzke) · 2024-06-21T16:15:46.560Z · comments (6)

Linear encoding of character-level information in GPT-J token embeddings
mwatkins · 2023-11-10T22:19:14.654Z · comments (4)

Adam Smith Meets AI Doomers
James_Miller · 2024-01-31T15:53:03.070Z · comments (10)

[link] Suffering Is Not Pain
jbkjr · 2024-06-18T18:04:43.407Z · comments (45)

Monthly Roundup #12: November 2023
Zvi · 2023-11-14T15:20:06.926Z · comments (5)

Wireheading and misalignment by composition on NetHack
pierlucadoro · 2023-10-27T17:43:41.727Z · comments (4)

Trying to deconfuse some core AI x-risk problems
habryka (habryka4) · 2023-10-17T18:36:56.189Z · comments (13)

Computational Mechanics Hackathon (June 1 & 2)
Adam Shai (adam-shai) · 2024-05-24T22:18:44.352Z · comments (5)

Difficulty classes for alignment properties
Jozdien · 2024-02-20T09:08:24.783Z · comments (5)

AXRP Episode 33 - RLHF Problems with Scott Emmons
DanielFilan · 2024-06-12T03:30:05.747Z · comments (0)

Intransitive Trust
Screwtape · 2024-05-27T16:55:29.294Z · comments (15)

The Cognitive Bootcamp Agreement
Raemon · 2024-10-16T23:24:05.509Z · comments (0)

[link] GPT2, Five Years On
Joel Burget (joel-burget) · 2024-06-05T17:44:17.552Z · comments (0)

The Schumer Report on AI (RTFB)
Zvi · 2024-05-24T15:10:03.122Z · comments (3)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

avturchin on an Evangelion dialogue explaining the QACI alignment plan

Quantum immortality and gun jammed do not contradict each other: for example, if we survive 10 rounds failures because of QI, we most likely survive only on those timelines where gun is broken. So both QI and gun jamming can be true and support one another and there is no contradiction.

annasalamon on Dragon Agnosticism

I don't see advantage to remaining agnostic, compared to:

1) Acquire all the private truth one can.

Plus:

2) Tell all the public truth one is willing to incur the costs of, with priority for telling public truths about what one would and wouldn't share (e.g. prioritizing to not pose as more truth-telling than one is).

--

The reason I prefer this policy to the OP's "don't seek truth on low-import highly-politicized matters" is that I fear not-seeking-truth begets bad habits. Also I fear I may misunderstand how important things are if I allow politics to influence which topics-that-interest-my-brain I do/don't pursue, compared to my current policy of having some attentional budget for "anything that interests me, whether or not it seems useful/virtuous."

danielfilan on Seven lessons I didn't learn from election day

Yeah but a bunch of people might actually answer how their neigbours will vote, given that that's what the pollster asked - and if the question is phrased as the post assumes, that's going to be a massive issue.

dakara on If we solve alignment, do we die anyway?

James, thank you for a well-written comment. It was a pleasure to read. Looking forward to Seth's response. Genuinely interested in hearing his thoughts.

avturchin on Anthropically Blind: the anthropic shadow is reflectively inconsistent

One problem here is that quantum immortality and angel immortality eventually merges: for example, if we survive 10 LHC failures because of QI, we most likely survive only on those timelines where some alien stops LHC. So both QI and angel immortality can be true and support one another and there is no contradiction.

avturchin on Quantum Immortality: A Perspective if AI Doomers are Probably Right

I know this post and have two problems with it: what they call 'anthropic shadow" is not proper term as Bostrom defined anthropic shadow as underestimation of past risks based on the fact of survival in his article this the same name. But it's ok.

The more serious problem is that quantum immortality and angel immortality eventually merges: for example, if we survive 10 LHC failures because of QI, we most likely survive only on those timeline where some alien stops LHC. So both QI and angel immortality can be true and support one another and there is no contradiction.

donald-hobson on Giant (In)scrutable Matrices: (Maybe) the Best of All Possible Worlds

We can discuss anything that exists, that might exist, that did exist, that could exist, and that could not exist. So no matter what form your predict-the-next-token language model takes, if it is trained over the entire corpus of the written word, the representations it forms will be pretty hard to understand, because the representations encode an entire understanding of the entire world.

Perhaps.

Imagine a huge number of very skilled programmers tried to manually hard code a ChatGPT in python.

Ask this pyGPT to play chess, and it will play chess. Look under the hood, and you see a chess engine programmed in. Ask it to solve algebra problems, a symbolic algebra package is in there. All in the best neat and well commented code.

Ask it to compose poetry, and you have some algorithm that checks if 2 words rhyme. Some syllable counter. Etc.

Rot13 is done with a hardcoded rot13 algorithm.

Somewhere in the algorithm is a giant list of facts, containing "Penguins Live In Antarctica". And if you change this fact to say "Penguins Live in Canada", then the AI will believe this. (Or spot it's inconsistency with other facts?)

And with one simple change, the AI believes this consistently. Penguins appear when this AI is asked for poems about canada, and don't appear in poems about Antarctica.

When asked about the native canadian diet, it will speculate that this likely included penguin, but say that it doesn't know of any documented examples of this.

Can you build something with ChatGPT level performance entirely out of human comprehensible programmatic parts?

Obviously having humans program these parts directly would be slow. (We are still talking about a lot of code.) But if some algorithm could generate that code?

avturchin on Anthropically Blind: the anthropic shadow is reflectively inconsistent

Check my new post which favors the longest and thickest timelines https://www.lesswrong.com/posts/hB2CTaxqJAeh5jdfF/quantum-immortality-a-perspective-if-ai-doomers-are-probably?commentId=aAzrogWBqtFDqMMpp

dennis-zoeller on Anvil Problems

That’s a fantastic memory aid for this concept, much appreciated! Crafting games in general give ample examples to internalize this kind of bootstrap mentality. Also for quickly scaling to the next anvil-equivalent. As you touched upon, real life has a deep crafting tree, with anvil problems upon anvil problems. Something that took me far too long to learn, if you got your anvil, but still don't find yourself were you want to be, it pays to find the next anvil problem quickly. If you still have a lot of distance to cover, don't get bogged down by things that won't get you the next anvil-equivalent.

In a certain way, relationships have their own anvils. There are thresholds of trust, communication modes, that take investment. However, they also unlock completely new options, particularly when addressing challenges or navigating high-stress situations. I sometime notice, in me and others, a neglect to do serious work on relationships during good times, then lacking the tools to handle difficulties when they arise.

dakara on AI Control: Improving Safety Despite Intentional Subversion

I do believe that Shubhorup's second edit provides another case for concern. If you combine his proposal with steganography, the collusion team has another way of successfully mitigating Ryan's safety strategy.

This collusion team's strategy should also ideally be addressed.