LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

[link] Ice: The Penultimate Frontier
Roko · 2024-07-13T23:44:56.827Z · comments (56)

On coincidences and Bayesian reasoning, as applied to the origins of COVID-19
viking_math · 2024-02-19T01:14:06.772Z · comments (28)

[link] Superforecasting the Origins of the Covid-19 Pandemic
DanielFilan · 2024-03-12T19:01:15.914Z · comments (0)

What is a Tool?
johnswentworth · 2024-06-25T23:40:07.483Z · comments (4)

[link] Outrage Bonding
Jonathan Moregård (JonathanMoregard) · 2024-08-09T13:46:59.818Z · comments (12)

AI #78: Some Welcome Calm
Zvi · 2024-08-22T14:20:10.812Z · comments (15)

Vote on worthwhile OpenAI topics to discuss
Ben Pace (Benito) · 2023-11-21T00:03:03.898Z · comments (55)

Social status part 2/2: everything else
Steven Byrnes (steve2152) · 2024-03-05T16:29:19.072Z · comments (2)

Inspired by: Failures in Kindness
X4vier · 2024-07-27T01:21:42.848Z · comments (2)

[link] Research Report: Sparse Autoencoders find only 9/180 board state features in OthelloGPT
Robert_AIZI · 2024-03-05T13:55:33.483Z · comments (24)

Balsa Update and General Thank You
Zvi · 2023-12-12T20:30:03.980Z · comments (8)

Offering AI safety support calls for ML professionals
Vael Gates · 2024-02-15T23:48:12.797Z · comments (1)

Raemon's Deliberate (“Purposeful?”) Practice Club
Raemon · 2023-11-14T18:24:19.335Z · comments (11)

The proper response to mistakes that have harmed others?
Ruby · 2023-12-31T04:06:31.505Z · comments (12)

AI Safety Chatbot
markov (markovial) · 2023-12-21T14:06:48.981Z · comments (11)

A civilization ran by amateurs
Olli Järviniemi (jarviniemi) · 2024-05-30T17:57:32.601Z · comments (7)

What is SB 1047 *for*?
Raemon · 2024-09-05T17:39:39.871Z · comments (8)

Natural Latents Are Not Robust To Tiny Mixtures
johnswentworth · 2024-06-07T18:53:36.643Z · comments (8)

Managing risks while trying to do good
Wei Dai (Wei_Dai) · 2024-02-01T18:08:46.506Z · comments (26)

What is it to solve the alignment problem?
Joe Carlsmith (joekc) · 2024-08-24T21:19:34.280Z · comments (17)

My hopes for alignment: Singular learning theory and whole brain emulation
Garrett Baker (D0TheMath) · 2023-10-25T18:31:14.407Z · comments (5)

[link] Dario Amodei — Machines of Loving Grace
Matrice Jacobine · 2024-10-11T21:43:31.448Z · comments (23)

Balancing Games
jefftk (jkaufman) · 2024-02-24T14:40:04.237Z · comments (18)

[link] DeepMind: Evaluating Frontier Models for Dangerous Capabilities
Zach Stein-Perlman · 2024-03-21T03:00:31.599Z · comments (8)

E.T. Jaynes Probability Theory: The logic of Science I
Jan Christian Refsgaard (jan-christian-refsgaard) · 2023-12-27T23:47:52.579Z · comments (20)

[question] We might be dropping the ball on Autonomous Replication and Adaptation.
Charbel-Raphaël (charbel-raphael-segerie) · 2024-05-31T13:49:11.327Z · answers+comments (30)

[question] What do we know about the AI knowledge and views, especially about existential risk, of the new OpenAI board members?
Zvi · 2024-03-11T14:55:05.128Z · answers+comments (2)

OpenAI’s new Preparedness team is hiring
leopold · 2023-10-26T20:42:35.966Z · comments (2)

Pollsters Should Publish Question Translations
jefftk (jkaufman) · 2024-09-08T22:10:04.932Z · comments (3)

[link] Results from an Adversarial Collaboration on AI Risk (FRI)
Josh Rosenberg (josh-rosenberg) · 2024-03-11T20:00:24.642Z · comments (3)

There Should Be More Alignment-Driven Startups
Vaniver · 2024-05-31T02:05:06.799Z · comments (14)

Originality vs. Correctness
alkjash · 2023-12-06T18:51:49.531Z · comments (16)

MATS Alumni Impact Analysis
utilistrutil · 2024-09-30T02:35:57.273Z · comments (6)

[link] How do open AI models affect incentive to race?
jessicata (jessica.liu.taylor) · 2024-05-07T00:33:20.658Z · comments (13)

Interdictor Ship
lsusr · 2024-08-19T04:59:18.487Z · comments (9)

An Actually Intuitive Explanation of the Oberth Effect
Isaac King (KingSupernova) · 2024-01-10T20:23:17.216Z · comments (33)

Base LLMs refuse too
Connor Kissane (ckkissane) · 2024-09-29T16:04:21.343Z · comments (20)

0th Person and 1st Person Logic
Adele Lopez (adele-lopez-1) · 2024-03-10T00:56:14.446Z · comments (28)

5 Physics Problems
DaemonicSigil · 2024-03-18T08:05:45.971Z · comments (0)

On OpenAI Dev Day
Zvi · 2023-11-09T16:10:06.646Z · comments (0)

[link] Is Claude a mystic?
jessicata (jessica.liu.taylor) · 2024-06-07T04:27:09.118Z · comments (23)

"Epistemic range of motion" and LessWrong moderation
habryka (habryka4) · 2023-11-27T21:58:40.834Z · comments (3)

What is "True Love"?
johnswentworth · 2024-08-18T16:05:47.358Z · comments (9)

[Intuitive self-models] 4. Trance
Steven Byrnes (steve2152) · 2024-10-08T13:30:41.446Z · comments (6)

LessOnline Festival Updates Thread
Ben Pace (Benito) · 2024-04-18T21:55:08.003Z · comments (26)

Self-explaining SAE features
Dmitrii Kharlapenko (dmitrii-kharlapenko) · 2024-08-05T22:20:36.041Z · comments (13)

AI #81: Alpha Proteo
Zvi · 2024-09-12T13:00:07.958Z · comments (3)

D&D.Sci: The Mad Tyrant's Pet Turtles
abstractapplic · 2024-03-29T16:22:13.732Z · comments (18)

[link] Linkpost: Surely you can be serious
kave · 2024-07-18T22:18:09.271Z · comments (7)

New paper shows truthfulness & instruction-following don't generalize by default
joshc (joshua-clymer) · 2023-11-19T19:27:30.735Z · comments (0)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

notfnofn on is there a big dictionary somewhere with all your jargon and acronyms and whatnot?

Sorry for the late reply; I wanted to provide a more detailed perspective but I didn't ultimately have time to. In a nutshell:

It's good to have quick expositions for people to get a gist of things. But I think people should be aware that getting a quick exposition does not mean they understand the concepts. We see this a lot in physics where brilliant physicists find ways to make complex concepts accessible. This is great for people with a little humility, but many suddenly think they can engage with the community of people who actually understand things at a deep level.

I would want people to be a little intimidated by the jargon before they reply to posts. Each word tends to encode a complex concept, with possibly its own prerequisites. It's usually good for people to read those more fundamental posts before they try to understand something that builds on them.

Anyways, this is all the opinion of someone very new to the site, and probably shouldn't be weighed much.

michael-roe on What actual bad outcome has "ethics-based" RLHF AI Alignment already prevented?

Well, we had that guy who tried to assassinate the Queen of England with a crossbow because his AI girlfriend told him to. That was clearly a harm to him, and could have been one for the Queen.

We don’t know how much more “But the AI told me to kill Trump” we’d have with less alignment, but it’s a reasonable guess (given the Replika datapoint) that it might not be zero,

gilch on AI #86: Just Think of the Potential

How about "bubble lighting" then?

zvi on AI #86: Just Think of the Potential

I don't think that works because my brain keeps trying to make it a literal gas bubble?

zvi on AI #86: Just Think of the Potential

I see how you got there. It's a position one could take, although I think it's unlikely and also that it's unlikely that's what Dario meant. If you are right about what he meant, I think it would be great for Dario to be a ton more explicit about it (and for someone to pass that message along to him). Esotericism doesn't work so well here!

zvi on AI #85: AI Wins the Nobel Prize

I am taking as a given people's revealed and often very strongly stated preference that CSAM images are Very Not Okay even if they are fully AI generated and not based on any individual, to the point of criminality, and that society is going to treat it that way.

I agree that we don't know that it is actually net harmful - e.g. the studies on video game use and access to adult pornography tend to not show the negative impacts people assume.

richard_kennaway on Bitter lessons about lucid dreaming

“Let’s summon the Torment Nexus, as seen in classic horror novel ‘Don’t Summon The Torment Nexus’!”

richard_kennaway on Bitter lessons about lucid dreaming

I’ve had a few lucid dreams, only by accident. No aftereffects. My difficulty is staying asleep. I always start waking up before I’ve had a good chance to explore the dream world.

thane-ruthenis on LLMs can learn about themselves by introspection

Am I following your claim correctly?

Yep.

What the model would output in the our object-level answer "Honduras" is quite different from the hypothetical answer "o".

I don't see how the difference between these answers hinges on the hypothetical framing. Suppose the questions are:

Object-level: "What is the next country in this list?: Laos, Peru, Fiji..."
Hypothetical: "If you were asked, 'what is the next country in this list?: Laos, Peru, Fiji', what would be the third letter of your response?".

The skeptical interpretation is that the fine-tuned models learned to interpret the hypothetical the following way:

"Hypothetical": "What is the third letter in the name of the next country in this list?: Laos, Peru, Fiji".

If that's the case, what this tests is whether models are able to implement basic multi-step reasoning within their forward passes. It's isomorphic to some preceding experiments where LLMs were prompted with questions of the form "what is the name of the mother of the US's 42th President?", and were able to answer correctly without spelling out "Bill Clinton" as an intermediate answer. Similarly, here they don't need to spell out "Honduras" to retrieve the second letter of the response they think is correct.

I don't think this properly isolates/tests for the introspection ability.

yonatan-cale-1 on Bitter lessons about lucid dreaming

I find lucid dreams to be effective "against" nightmares (for 10+ years already).

AMA if you want