LessWrong 2.0 Reader

View: New · Old · Top

next page (older posts) →

How Logic "Really" Works: An Engineering Perspective
Daniil Strizhov (mila-dolontaeva) · 2025-04-16T05:34:09.443Z · comments (0)

Opportunity to to learn more about AI Innovation & Security Policy
PolicyTakes · 2025-04-16T01:35:27.203Z · comments (0)

D&D.Sci Tax Day: Adventurers and Assessments
aphyer · 2025-04-15T23:43:14.733Z · comments (2)

[link] Should AIs be Encouraged to Cooperate?
PeterMcCluskey · 2025-04-15T21:57:06.096Z · comments (0)

OpenAI rewrote its Preparedness Framework
Zach Stein-Perlman · 2025-04-15T20:00:50.614Z · comments (1)

[link] ASI existential risk: Reconsidering Alignment as a Goal
habryka (habryka4) · 2025-04-15T19:57:42.547Z · comments (4)

[link] Nucleic Acid Observatory Updates, April 2025
jefftk (jkaufman) · 2025-04-15T18:58:29.839Z · comments (0)

Some OthelloGPT Circuits
Alfred Wong (alfred-wong) · 2025-04-15T18:41:36.216Z · comments (0)

The Mirror Problem in AI: Why Language Models Say Whatever You Want
RobT · 2025-04-15T18:40:02.793Z · comments (1)

What happens when LLMs learn new things? & Continual learning forever.
sunchipsster · 2025-04-15T18:38:35.166Z · comments (0)

To be legible, evidence of misalignment probably has to be behavioral
ryan_greenblatt · 2025-04-15T18:14:53.022Z · comments (5)

[link] AISN #51: AI Frontiers
Corin Katzke (corin-katzke) · 2025-04-15T16:01:56.701Z · comments (1)

Surprising LLM reasoning failures make me think we still need qualitative breakthroughs for AGI
Kaj_Sotala · 2025-04-15T15:56:19.466Z · comments (15)

OpenAI #13: Altman at TED and OpenAI Cutting Corners on Safety Testing
Zvi · 2025-04-15T15:30:02.518Z · comments (3)

[link] The real reason AI benchmarks haven’t reflected economic impacts
Noosphere89 (sharmake-farah) · 2025-04-15T13:44:06.225Z · comments (0)

Map of AI Safety v2
Bryce Robertson (bryceerobertson) · 2025-04-15T13:04:40.993Z · comments (1)

[link] 3M Subscriber YouTube Account 'Channel 5' Reporting On Rationalism
sakraf · 2025-04-15T13:02:33.736Z · comments (0)

Can SAE steering reveal sandbagging?
jordine · 2025-04-15T12:33:41.264Z · comments (2)

Risers for Foot Percussion
jefftk (jkaufman) · 2025-04-15T11:10:08.577Z · comments (0)

What empirical research directions has Eliezer commented positively on?
Chris_Leong · 2025-04-15T08:53:41.677Z · comments (1)

Debunking the Hard Problem: Consciousness as Integrated Prediction
gmax (maxim-gurevich) · 2025-04-15T08:38:50.637Z · comments (8)

How to Defend the Indefensible
Alex Beyman (alexbeyman) · 2025-04-15T07:45:15.971Z · comments (0)

A Talmudic Rationalist Cautionary Tale
Noah Birnbaum (daniel-birnbaum) · 2025-04-15T04:11:16.972Z · comments (1)

Creating 'Making God': a Feature Documentary on risks from AGI
Connor Axiotes (connor-axiotes-1) · 2025-04-15T02:56:09.206Z · comments (0)

A Dissent on Honesty
eva_ · 2025-04-15T02:43:44.163Z · comments (20)

$500 bounty for best short-form fiction about our near future world; $100 for recommending winning piece: new “Art of Near Future World” quarterly art project
Ramon Gonzalez (ramon-gonzalez) · 2025-04-15T00:46:10.637Z · comments (0)

What if there was a nuke in Manhattan and why that could be a good thing
Ratburn · 2025-04-15T00:19:41.844Z · comments (10)

[link] Nihilism Is Not Enough By Peter Thiel
shawkisukkar · 2025-04-15T00:13:01.375Z · comments (0)

Correcting Deceptive Alignment using a Deontological Approach
JeaniceK · 2025-04-14T22:07:57.860Z · comments (0)

Religious Persistence: A Missing Primitive for Robust Alignment
lauriewired · 2025-04-14T22:03:45.868Z · comments (3)

[link] The 4-Minute Mile Effect
Parker Conley (parker-conley) · 2025-04-14T21:41:27.726Z · comments (3)

Lightning Talks!
nathandunkerley · 2025-04-14T20:39:17.593Z · comments (0)

The Bell Curve of Bad Behavior
Screwtape · 2025-04-14T19:58:10.293Z · comments (5)

[link] Sentinel's Global Risks Weekly Roundup #15/2025: Tariff yoyo, OpenAI slashing safety testing, Iran nuclear programme negotiations, 1K H5N1 confirmed herd infections.
NunoSempere (Radamantis) · 2025-04-14T19:11:20.977Z · comments (0)

Sam Altman's sister claims Sam sexually abused her -- Part 7: Timeline, continued
pythagoras5015 (pl5015) · 2025-04-14T17:43:28.897Z · comments (0)

Sam Altman's sister claims Sam sexually abused her -- Part 8: Timeline, continued
pythagoras5015 (pl5015) · 2025-04-14T17:42:53.705Z · comments (0)

[link] Frontier AI Models Still Fail at Basic Physical Tasks: A Manufacturing Case Study
Adam Karvonen (karvonenadam) · 2025-04-14T17:38:02.918Z · comments (23)

How to evaluate control measures for LLM agents? A trajectory from today to superintelligence
Tomek Korbak (tomek-korbak) · 2025-04-14T16:45:46.584Z · comments (0)

Applications Open for Impact Accelerator Program for Experienced Professionals
Clark Wisenbaker (accounts-hip) · 2025-04-14T16:27:32.340Z · comments (0)

The Last Light
Bridgett Kay (bridgett-kay) · 2025-04-14T15:41:02.745Z · comments (0)

Offer: Team Conflict Counseling for AI Safety Orgs
Severin T. Seehrich (sts) · 2025-04-14T15:17:00.835Z · comments (1)

[link] Slopworld 2035: The dangers of mediocre AI
titotal (lombertini) · 2025-04-14T13:14:08.390Z · comments (6)

Try training token-level probes
StefanHex (Stefan42) · 2025-04-14T11:56:23.191Z · comments (2)

Monthly Roundup #29: April 2025
Zvi · 2025-04-14T11:50:02.324Z · comments (6)

A Solution to Sandbagging and other Self-Provable Misalignment: Constitutional AI Detectives
Knight Lee (Max Lee) · 2025-04-14T10:27:24.903Z · comments (2)

One-shot steering vectors cause emergent misalignment, too
Jacob Dunefsky (jacob-dunefsky) · 2025-04-14T06:40:41.503Z · comments (6)

[link] Unbendable Arm as Test Case for Religious Belief
Ivan Vendrov (ivan-vendrov) · 2025-04-14T01:57:12.013Z · comments (29)

Sam Altman's sister claims Sam sexually abused her -- Part 5: Timeline, continued
pythagoras5015 (pl5015) · 2025-04-14T01:00:07.084Z · comments (0)

Луна Лавгуд и Комната Тайн, Часть 5
Kongo Landwalker (kongo-landwalker) · 2025-04-14T00:10:36.028Z · comments (0)

Sam Altman's sister claims Sam sexually abused her -- Part 4: Timeline, continued
pythagoras5015 (pl5015) · 2025-04-13T23:41:55.411Z · comments (0)

next page (older posts) →

Archive

2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
- January
- February
- March
- April
- May
- June
- July
- August
- September
- October
- November
- December

Recent comments

tag on Debunking the Hard Problem: Consciousness as Integrated Prediction

If this doesn’t count as an explanation (or at least a concrete hypothesis), what would one look like to you?

Something that inputs a brain state and outputs a quale ie solves the Mary's Room" problem. And does it in a principled way, not just a look up table of known correlations.

richard_kennaway on Debunking the Hard Problem: Consciousness as Integrated Prediction

It’s like asking why high kinetic energy “feels” hot. It doesn’t, heat is just how the brain models signals from temperature receptors and maps them into the self-model.

We know how high (random) kinetic energy causes a high reading on a thermometer.

We do not know why this "feels hot" to people but (we presume) not to a thermometer. Or if you think, as some have claimed to, that it might actually "feel hot" to a strand of mercury in a glass tube, how would you go about finding out, given that in the case of a thermometer, we already know all the relevant physical facts about why the line lengthens and shrinks?

Sections 4 and 5 explain why this evolved: it’s a useful way for the brain to prioritize action when reflexes aren’t enough. You “feel” something because that’s how your brain tracks itself and the environment.

This is redefining the word "feel", not accounting for the thing that "feel" ordinarily points to.

The same thing happened to the word "sensation" when mechanisms of the sensory organs were being traced out. The mechanism of how sensations "feel" (the previous meaning of the word "sensation") was never found, and "sensation" came to be used to mean only those physical mechanisms. This is why the word "quale" (pl. qualia) was revived, to refer to what there was no longer a name for, the subjective experience of "sensations" (in the new sense).

The OP, for all its length, appears to be redefining the word "conscious" to mean "of a system, that it contains a model of itself". It goes into great detail and length on phenomena of self-modelling and speculations of why they may have arisen, and adds the bald assertion, passim, that this is what consciousness is. The original concept that it aims and claims to explain is not touched on.

mitchell_porter on Debunking the Hard Problem: Consciousness as Integrated Prediction

You say consciousness = successful prediction. What happens when the predictions are wrong?

davey-morse on ASI existential risk: Reconsidering Alignment as a Goal

I'm saying the issue of whether ASI gets out of control is not fundamental to the discussion of whether ASI poses an xrisk or how to avert it.

The control question is not fundamental to discussion of whether ASI poses x-risk—agreed. But as to discussion of how to avert x-risk, the control question is fundamental.

Humanity's optimal strategy for averting x-risk depends on whether we can ultimately control ASI. If control is possible, then the best strategy for averting x-risk is coordination of ASI development—across companies and nations. If control is not possible, then the best strategy is very different and even less well-defined (e.g., pausing ASI development, attempting to seed ASI so that it becomes benevolent, making preparations so humans can live alongside self-directed ASI, etc).

So while it's possible that emphasis on the control question turns many people away from the xrisk conversation, I think the control question remains key for conversation about xrisk solutions.

tailcalled on johnswentworth's Shortform

I think self-supervised learning is perfectly capable of discovering rare but important patterns. Just look at today’s foundation models, which seem pretty great at that.

This I'd dispute. If your model if underparameterized (which I think is true for the typical model?), then it can't learn any patterns that only occurs once in the data. And even if the model is overparameterized, it still can't learn any pattern that never occurs in the data.

I think you’re conflating consequentialism and understanding in a weird-to-me way. (Or maybe I’m misunderstanding.)
I think consequentialism is related to choosing one action versus another action. I think understanding (e.g. predicting the consequence of an action) is different, and that in practice understanding has to involve self-supervised learning.

I'm saying that intelligence is the thing that allows you to handle patterns. So if you've got a dataset, intelligence allows you to build a model that makes predictions for other data based on the patterns it can find in said dataset. And if you have a function, intelligence allows you to find optima for said function based on the patterns it can find in said function.

Consequentialism is a way to set up intelligence to be agent-ish. This often involves setting up something that's meant to build an understanding of actions based on data or experience.

One could in principle cut my definition of consequentialism up into self-supervised learning and true consequentialism (this seems like what you are doing..?). One disadvantage with that is that consequentialist online learning is going to have a very big effect on the dataset one ends up training the understanding on, so they're not really independent of each other. Either way that just seems like a small labelling thing to me.

maxnadeau on The 4-Minute Mile Effect

Typo: should be "Gell-Mann"

jonas-hallgren on ASI existential risk: Reconsidering Alignment as a Goal

I would have wanted more pointing out of institutional capacity as part of the solution to this but I think it is a very good way of describing a more generalised re-focus to not goodharting on sub-parts of the problem.

Now, that I've said something serious I can finally comment on what I wanted to comment on:

Thanks to Ted Chiang, Toby Ord, and Hannu Rajaniemi for conversations which improved this piece.

Ted Chiang!!! I'm excited for some banger sci-fi based on things more relating to x-risk scenarios, that is so cool!

mitchell_porter on ASI existential risk: Reconsidering Alignment as a Goal

I knew the author (Michael Nielsen) once but didn't stay in touch... I had a little trouble figuring out what he actually advocates here, e.g. at the end he talks about increasing "the supply of safety", and lists "differential technological development" (Bostrom), "d/acc" (Buterik), and "coceleration" (Nielsen) as "ongoing efforts" that share this aim, without defining any of them. But following his links, I would define those in turn as "slowing down dangerous things, and speeding up beneficial things"; "focusing on decentralization and individual defense"; and "advancing safety as well as advancing capabilities".

In this particular essay, his position seems similar to contemporary MIRI. MIRI gave up on alignment in favor of just stopping the stampede towards AI, and here Michael is also saying that people who care about AI safety should work on topics other than alignment (e.g. "institutions, norms, laws, and education"), because (my paraphrase) alignment work is just adding fuel to the fire of advances in AI.

Well, let's remind ourselves of the current situation. There are two AI powers in the world, America and China (and plenty of other nations who would gladly join them in that status). Both of them are hosting a capabilities race in which multiple billion-dollar companies compete to advance AI, and "making the AI too smart" is not something that either side cares about. We are in a no-brakes race towards superintelligence, and alignment research is the only organized effort aimed at making the outcome human-friendly.

I think plain speaking is important at this late stage, so let me also try to be as clear as possible about how I see our prospects.

First, the creation of superintelligence will mean that humanity is no longer in control, unless human beings are somehow embedded in it. Superintelligence may or may not coexist with us, I don't know the odds of it emerging in a human-friendly form; but it will have the upper hand, we will be at its mercy. If we don't intend to just gamble on there being a positive outcome, we need alignment research. For that matter, if we really didn't want to gamble, we wouldn't create superintelligence until we had alignment theory perfectly worked out. But we don't live in that timeline.

Second, although we are not giving ourselves time to solve alignment safely, that still has a chance of happening, if rising capabilities are harnessed to do alignment research. If we had no AI, maybe alignment theory would take 20 or 50 years to solve, but with AI, years of progress can happen in months or weeks. I don't know the odds of alignment getting fully solved in that way, but the ingredients are there for it to happen.

I feel I should say something on the prospect of a global pause or a halt occurring. I would call it unlikely but not impossible. It looks unlikely because we are in a decentralized no-holds-barred race towards superintelligence already, and the most advanced AIs are looking pretty capable (despite some gaps e.g. 1 [LW · GW] 2 [LW · GW]), and there's no serious counterforce on the political scene. It's not impossible because change, even massive change, does happen in politics and geopolitics, and there's only a finite number of contenders in the race (though that number grows every year).

knight-lee on keltan's Shortform

I agree this stuff is addictive. AI makes things more interactive. Some [LW · GW] people [LW · GW] who never considered themselves vulnerable got sucked in to AI relationships.

Possible push back:

What if short bits of addictive content generated by humans (but selected by algorithms) are already near max addictiveness? And by the time AI can design/write a video game etc. twice as addictive than humans can design, we already have a superintelligence explosion, and either addiction is solved or we are dead?

knight-lee on To be legible, evidence of misalignment probably has to be behavioral

When Gemini randomly told an innocent user to go kill himself, it made the news, but this news didn't really affect very much in the big picture.

It's possible that relevant decision-makers don't care that much about dramatic bad behaviours since the vibe is "oh yeah AI glitches up, oh well."

It's possible that relevant decision-makers do care more about what the top experts believe, and if the top experts are convinced that current models already want to kill you (but can't), it may have an effect. Imagine if many top experts agree that "the lie detectors start blaring like crazy when the AI is explaining how it won't kill all humans even if can get away with it."

I'm not directly disagreeing with this post, I'm just saying there exists this possible world model where behavioural evidence isn't much stronger (than other misalignment evidence).