LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Representation Tuning
Christopher Ackerman (christopher-ackerman) · 2024-06-27T17:44:33.338Z · comments (9)

Open consultancy: Letting untrusted AIs choose what answer to argue for
Fabien Roger (Fabien) · 2024-03-12T20:38:03.785Z · comments (5)

An Introduction to Representation Engineering - an activation-based paradigm for controlling LLMs
Jan Wehner · 2024-07-14T10:37:21.544Z · comments (5)

Motivating Alignment of LLM-Powered Agents: Easy for AGI, Hard for ASI?
RogerDearnaley (roger-d-1) · 2024-01-11T12:56:29.672Z · comments (4)

The Cognitive Bootcamp Agreement
Raemon · 2024-10-16T23:24:05.509Z · comments (0)

A path to human autonomy
Nathan Helm-Burger (nathan-helm-burger) · 2024-10-29T03:02:42.475Z · comments (14)

Fertility Roundup #4
Zvi · 2024-12-02T14:30:05.968Z · comments (16)

[question] Feedback request: what am I missing?
Nathan Helm-Burger (nathan-helm-burger) · 2024-11-02T17:38:39.625Z · answers+comments (5)

Rolling Thresholds for AGI Scaling Regulation
Larks · 2025-01-12T01:30:23.797Z · comments (3)

Fireplace and Candle Smoke
jefftk (jkaufman) · 2025-01-01T01:50:01.408Z · comments (4)

Basics of Handling Disagreements with People
Camille Berger (Camille Berger) · 2024-11-12T17:55:08.143Z · comments (4)

Flipping Out: The Cosmic Coinflip Thought Experiment Is Bad Philosophy
Joe Rogero · 2024-11-12T23:55:46.770Z · comments (17)

Alternative Cancer Care As Biohacking & Book Review: Surviving "Terminal" Cancer
DenizT · 2025-01-06T07:43:52.773Z · comments (6)

Dmitry's Koan
Dmitry Vaintrob (dmitry-vaintrob) · 2025-01-10T04:27:30.346Z · comments (3)

“Charity” as a conflationary alliance term
Jan_Kulveit · 2024-12-12T21:49:50.057Z · comments (2)

Difficulty classes for alignment properties
Jozdien · 2024-02-20T09:08:24.783Z · comments (5)

Computational Mechanics Hackathon (June 1 & 2)
Adam Shai (adam-shai) · 2024-05-24T22:18:44.352Z · comments (5)

[link] hydrogen tube transport
bhauth · 2024-04-18T22:47:08.790Z · comments (12)

Geometric Utilitarianism (And Why It Matters)
StrivingForLegibility · 2024-05-12T03:41:21.342Z · comments (2)

[link] Suffering Is Not Pain
jbkjr · 2024-06-18T18:04:43.407Z · comments (45)

Copyright Confrontation #1
Zvi · 2024-01-03T15:50:04.850Z · comments (7)

[link] Romae Industriae
Maxwell Tabarrok (maxwell-tabarrok) · 2024-07-19T13:03:31.536Z · comments (2)

The Schumer Report on AI (RTFB)
Zvi · 2024-05-24T15:10:03.122Z · comments (3)

AI Safety Strategies Landscape
Charbel-Raphaël (charbel-raphael-segerie) · 2024-05-09T17:33:45.853Z · comments (1)

AI Impacts Survey: December 2023 Edition
Zvi · 2024-01-05T14:40:06.156Z · comments (6)

[link] The $100B plan with "70% risk of killing us all" w Stephen Fry [video]
Oleg Trott (oleg-trott) · 2024-07-21T20:06:39.615Z · comments (8)

[link] Inferring the model dimension of API-protected LLMs
Ege Erdil (ege-erdil) · 2024-03-18T06:19:25.974Z · comments (3)

AXRP Episode 33 - RLHF Problems with Scott Emmons
DanielFilan · 2024-06-12T03:30:05.747Z · comments (0)

D&D.Sci (Easy Mode): On The Construction Of Impossible Structures
abstractapplic · 2024-05-17T00:25:42.950Z · comments (12)

If You Can Climb Up, You Can Climb Down
jefftk (jkaufman) · 2024-07-30T00:00:06.295Z · comments (9)

[link] GPT2, Five Years On
[deleted] · 2024-06-05T17:44:17.552Z · comments (0)

One True Love
Zvi · 2024-02-09T15:10:05.298Z · comments (7)

Adam Smith Meets AI Doomers
James_Miller · 2024-01-31T15:53:03.070Z · comments (10)

[link] legged robot scaling laws
bhauth · 2024-01-20T05:45:56.632Z · comments (8)

Augmenting Statistical Models with Natural Language Parameters
jsteinhardt · 2024-09-20T18:30:10.816Z · comments (0)

[link] Book review: On the Edge
PeterMcCluskey · 2024-08-30T22:18:39.581Z · comments (0)

Childhood and Education Roundup #7
Zvi · 2024-12-09T13:10:05.588Z · comments (10)

AI #56: Blackwell That Ends Well
Zvi · 2024-03-21T12:10:05.412Z · comments (16)

AXRP Episode 38.2 - Jesse Hoogland on Singular Learning Theory
DanielFilan · 2024-11-27T06:30:03.821Z · comments (0)

[link] The last era of human mistakes
owencb · 2024-07-24T09:58:42.116Z · comments (2)

[link] The Cancer Resolution?
PeterMcCluskey · 2024-07-24T00:25:17.322Z · comments (27)

Musings on LLM Scale (Jul 2024)
Vladimir_Nesov · 2024-07-03T18:35:48.373Z · comments (0)

[link] My Apartment Art Commission Process
jenn (pixx) · 2024-08-26T18:36:44.363Z · comments (4)

"Which chains-of-thought was that faster than?"
Emrik (Emrik North) · 2024-05-22T08:21:00.269Z · comments (4)

(Maybe) A Bag of Heuristics is All There Is & A Bag of Heuristics is All You Need
Sodium · 2024-10-03T19:11:58.032Z · comments (17)

ARENA4.0 Capstone: Hyperparameter tuning for MELBO + replication on Llama-3.2-1b-Instruct
25Hour (aaron-kaufman) · 2024-10-05T11:30:11.953Z · comments (2)

Musings on Text Data Wall (Oct 2024)
Vladimir_Nesov · 2024-10-05T19:00:21.286Z · comments (2)

[link] Liquid vs Illiquid Careers
vaishnav92 · 2024-10-20T23:03:49.725Z · comments (7)

What I Learned (Conclusion To "The Sense Of Physical Necessity")
LoganStrohl (BrienneYudkowsky) · 2024-03-20T21:24:37.464Z · comments (0)

[link] Robin Hanson & Liron Shapira Debate AI X-Risk
Liron · 2024-07-08T21:45:40.609Z · comments (4)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

viliam on Making LLMs safer is more intuitive than you think: How Common Sense and Diversity Improve AI Alignment

Models could incorporate long-term predictions, ensuring decisions align with future sustainability and impact goals.
Researchers implement mechanisms where AI models autonomously recognize misalignment, shut down harmful behaviors, or even self-destruct (via wiping out network weights) for severe cases.
We could build modular AI systems with distinct sub-components focusing on different objectives (e.g., overall goal, ethical considerations, social implications). These agents could check each other's outputs, flagging potential high-risk conflicts or misalignment.

This feels like you either misunderstood the problem, or you respond by circular logic.

The problem with alignment is that we don't know how to do alignment.

Your proposals:

do alignment with future sustainability
recognize misalignment
recognize other agent's misalignment

I repeat: the problem is that we don't know how to "do alignment" (or "recognize misalignment").

*

As an analogy, imagine that someone tells you "I don't know how to swim", and your advice would be:

keep your head safely above the water
don't drown
when swimming together with other people at the same skill level, check each other that you are not drowning

Well, if I knew how to keep my head safely above the water and how not to drown, I wouldn't be asking the question in the first place.

marius-hobbhahn on What’s the short timeline plan?

If I had more time, I would have written a shorter post ;)

cousin_it on Cast it into the fire! Destroy it!

What about biological augmentation of intelligence? I think if other avenues are closed, this one can still go pretty far, enough to make society very stratified and incomprehensible and so on. You can imagine biological self-improving intelligences too.

So maybe if you're serious about closing all avenues, it amounts to creating a god that will forever watch over everything and prevent things from becoming too smart. It doesn't seem like such a good idea anymore.

stephen-fowler on Do Antidepressants work? (First Take)

"cannot imagine a study that would convince me that it "didn't work" for me, in the ways that actually matter. The effects on my mind kick in sharply, scale smoothly with dose, decay right in sync with half-life in the body, and are clearly noticeable not just internally for my mood but externally in my speech patterns, reaction speeds, ability to notice things in my surroundings, short term memory, and facial expressions."

The drug actually working would mean that your life is better after 6 years of taking the drug compared to the counterfactual where you took a placebo.

The observations you describe are explained by you simply having a chemical dependency on a drug that you have been on for 6 years.

tom-davidson on Human takeover might be worse than AI takeover

Why are they more recoverable? Seems like a human who seized power would seek asi advice on how to cement their power

simon-pepin-lehalleur on Dmitry's Koan

I mentioned samples and expectations for the TLBP because it seems possible (and suggested by the role of degeneracies in SLT) that different samples can correspond to qualitatively different degradations of the model. Cartoon picture : besides the robust circuit X of interest, there are "fragile" circuits A and B, and most samples at a given loss scale degrade either A or B but not both.

I agree that there is no strong reason to overindex on the Watanabe temperature, which is derived from an idealised situation: global Bayesian inference, degeneracies exactly at the optimal parameters, "relatively finite variance", etc. The scale you propose seems quite natural but I will let LLC-practitioners comment on that.

mattmacdermott on Tips On Empirical Research Slides

I found this really useful, thanks! I especially appreciate details like how much time you spent on slides at first, and how much you do now.

viliam on Action: how do you REALLY go about doing?

In regard to my situation and why I'm presenting you my ideas, I'm an amateur thinker who is wishing to popularise and spread my ideas and is outside the intellectual community so I'm in need of help in spreading my ideas so if you're feeling generous I'd like to ask for help from any of you reading to spread my ideas.

I think you have skipped a few steps here.

First, you need to have some good ideas. They do not necessarily need to be original; popularizing existing good ideas is also a great thing.

Second, you need to get good at explaining things. Write clearly, provide specific examples, etc.

Then, you can write a few articles and people will be happy to share them.

It seems like you think that you are currently at step three. To me it seems like you are still struggling with step one (or maybe step two). I have no idea what are the ideas you want to spread.

cousin_it on Applying traditional economic thinking to AGI: a trilemma

Sure. But in an economy with AIs, humans won't be like Bob. They'll be more like Carl the bottom-percentile employee who struggles to get any job at all. Even in today's economy lots of such people exist, so any theoretical argument saying it can't happen has got to be wrong.

And if the argument is quantitative - say, that the unemployment rate won't get too high - then imagine an economy with 100x more AIs than people, where unemployment is only 1% but all people are unemployed. There's no economic principle saying that can't happen.

zack_m_davis on A Hill of Validity in Defense of Meaning

(Self-review.) I think this pt. 2 is the second most interesting entry in my Whole Dumb Story memoir sequence. (Pt. 1 [LW · GW] deals with more niche psychology stuff than the philosophical malpractice covered here; pt. 3 [LW · GW] is a more of grab-bag of stuff that happened between April 2019 and January 2021; pt. 4 [LW · GW] is the climax. Expect the denouement pt. 5 in mid-2025.)

I feel a lot more at peace having this out there. (If we can't have justice, sanity, or language, at least I got to tell my story about trying to protect them.)

The 8 karma in 97 votes is kind of funny in how nakedly political it is. (I think it was higher before the post got some negative attention on Twitter.)

Given how much prereading and editing effort had already gone into this, it's disappointing that I didn't get the ending right the first time. (I ended up rewriting some of the paragraphs at the end after initial publication [LW(p) · GW(p)] after it didn't land in the comments section the way I wanted it to land.)

Subsection titles would have also been a better choice for such a long piece (which was rectified for the publication of pt.s 3 and 4); I may still yet add them.