LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

How well do truth probes generalise?
mishajw · 2024-02-24T14:12:19.729Z · comments (11)

I'm a bit skeptical of AlphaFold 3
Oleg Trott (oleg-trott) · 2024-06-25T00:04:41.274Z · comments (14)

[link] Detecting Genetically Engineered Viruses With Metagenomic Sequencing
jefftk (jkaufman) · 2024-06-27T14:01:34.868Z · comments (10)

We might be missing some key feature of AI takeoff; it'll probably seem like "we could've seen this coming"
Lukas_Gloor · 2024-05-09T15:43:11.490Z · comments (36)

[link] More Hyphenation
Arjun Panickssery (arjun-panickssery) · 2024-02-07T19:43:29.086Z · comments (19)

You’re Measuring Model Complexity Wrong
Jesse Hoogland (jhoogland) · 2023-10-11T11:46:12.466Z · comments (15)

LLMs Look Increasingly Like General Reasoners
eggsyntax · 2024-11-08T23:47:28.886Z · comments (40)

Solving adversarial attacks in computer vision as a baby version of general AI alignment
Stanislav Fort (stanislavfort) · 2024-08-29T17:17:47.136Z · comments (8)

Growth and Form in a Toy Model of Superposition
Liam Carroll (liam-carroll) · 2023-11-08T11:08:04.359Z · comments (7)

Natural Latents: The Concepts
johnswentworth · 2024-03-20T18:21:19.878Z · comments (18)

Research update: Towards a Law of Iterated Expectations for Heuristic Estimators
Eric Neyman (UnexpectedValues) · 2024-10-07T19:29:29.033Z · comments (2)

OpenAI: Helen Toner Speaks
Zvi · 2024-05-30T21:10:02.938Z · comments (8)

A Crisper Explanation of Simulacrum Levels
Thane Ruthenis · 2023-12-23T22:13:52.286Z · comments (13)

We don't understand what happened with culture enough
Jan_Kulveit · 2023-10-09T09:54:20.096Z · comments (21)

Apply to be a Safety Engineer at Lockheed Martin!
yanni kyriacos (yanni) · 2024-03-31T21:02:08.499Z · comments (3)

[Intuitive self-models] 1. Preliminaries
Steven Byrnes (steve2152) · 2024-09-19T13:45:27.976Z · comments (20)

[link] The Puritans would one-box: evidential decision theory in the 17th century
Jacob G-W (g-w1) · 2023-10-14T20:23:24.346Z · comments (5)

GPT-o1
Zvi · 2024-09-16T13:40:06.236Z · comments (34)

There is a globe in your LLM
jacob_drori (jacobcd52) · 2024-10-08T00:43:40.300Z · comments (4)

[link] Seven lessons I didn't learn from election day
Eric Neyman (UnexpectedValues) · 2024-11-14T18:39:07.053Z · comments (31)

The Aspiring Rationalist Congregation
maia · 2024-01-10T22:52:54.298Z · comments (23)

Fluent, Cruxy Predictions
Raemon · 2024-07-10T18:00:06.424Z · comments (14)

Addressing Feature Suppression in SAEs
Benjamin Wright (Benw8888) · 2024-02-16T18:32:51.927Z · comments (3)

[Valence series] 2. Valence & Normativity
Steven Byrnes (steve2152) · 2023-12-07T16:43:49.919Z · comments (5)

[link] Dario Amodei’s prepared remarks from the UK AI Safety Summit, on Anthropic’s Responsible Scaling Policy
Zac Hatfield-Dodds (zac-hatfield-dodds) · 2023-11-01T18:10:31.110Z · comments (1)

[link] Environmentalism in the United States Is Unusually Partisan
Jeffrey Heninger (jeffrey-heninger) · 2024-05-13T21:23:10.755Z · comments (26)

[link] Linkpost: Rishi Sunak's Speech on AI (26th October)
bideup · 2023-10-27T11:57:46.575Z · comments (8)

[question] What are the best arguments for/against AIs being "slightly 'nice'"?
Raemon · 2024-09-24T02:00:19.605Z · answers+comments (51)

A simple case for extreme inner misalignment
Richard_Ngo (ricraz) · 2024-07-13T15:40:37.518Z · comments (41)

Reflections on Less Online
Error · 2024-07-07T03:49:44.534Z · comments (15)

5 homegrown EA projects, seeking small donors
Austin Chen (austin-chen) · 2024-10-28T23:24:25.745Z · comments (4)

[link] Anxiety vs. Depression
Sable · 2024-03-17T00:15:08.255Z · comments (35)

Rejecting Television
Declan Molony (declan-molony) · 2024-04-23T04:59:50.253Z · comments (10)

Scalable oversight as a quantitative rather than qualitative problem
Buck · 2024-07-06T17:42:41.325Z · comments (11)

[link] [Paper] AI Sandbagging: Language Models can Strategically Underperform on Evaluations
Teun van der Weij (teun-van-der-weij) · 2024-06-13T10:04:49.556Z · comments (10)

MATS Winter 2023-24 Retrospective
utilistrutil · 2024-05-11T00:09:17.059Z · comments (28)

[link] [Paper] Stress-testing capability elicitation with password-locked models
Fabien Roger (Fabien) · 2024-06-04T14:52:50.204Z · comments (10)

Some for-profit AI alignment org ideas
Eric Ho (eh42) · 2023-12-14T14:23:20.654Z · comments (19)

[link] What are you getting paid in?
Austin Chen (austin-chen) · 2024-07-17T19:23:04.219Z · comments (14)

[link] "AI Safety for Fleshy Humans" an AI Safety explainer by Nicky Case
habryka (habryka4) · 2024-05-03T18:10:12.478Z · comments (10)

Self-prediction acts as an emergent regularizer
Cameron Berg (cameron-berg) · 2024-10-23T22:27:03.664Z · comments (4)

[link] Nietzsche's Morality in Plain English
Arjun Panickssery (arjun-panickssery) · 2023-12-04T00:57:42.839Z · comments (13)

Newsom Vetoes SB 1047
Zvi · 2024-10-01T12:20:06.127Z · comments (6)

[link] Hardshipification
Jonathan Moregård (JonathanMoregard) · 2024-05-28T20:02:29.709Z · comments (17)

[link] A Universal Emergent Decomposition of Retrieval Tasks in Language Models
Alexandre Variengien (alexandre-variengien) · 2023-12-19T11:52:27.354Z · comments (3)

Anvil Problems
Screwtape · 2024-11-13T22:57:41.974Z · comments (9)

[link] The Real Fanfic Is The Friends We Made Along The Way
Eneasz · 2023-10-18T19:21:40.431Z · comments (0)

Saying the quiet part out loud: trading off x-risk for personal immortality
disturbance · 2023-11-02T17:43:34.155Z · comments (89)

[link] What Depression Is Like
Sable · 2024-08-27T17:43:22.549Z · comments (23)

Sparse Autoencoders Work on Attention Layer Outputs
Connor Kissane (ckkissane) · 2024-01-16T00:26:14.767Z · comments (9)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

kabir-kumar on The Online Sports Gambling Experiment Has Failed

People Cannot Handle Gambling on Smartphones

this seems a very strange way to say "Smartphone Gambling is Unhealthy"
It's like saying "People's Lungs Cannot Handle Cigarettes"

deepthoughtlife on Making a conservative case for alignment

As a (severe) skeptic of all the AI doom stuff and a moderate/centrist that has been voting for conservatives I decided my perspective on this might be useful here (which obviously skews heavily left). (While my response is in order, the numbers are there to separate my points, not to give which paragraph I am responding to.)

"AI-not-disempowering-humanity is conservative in the most fundamental sense"
    1.Well, obviously this title section is completely true. If conservative means anything, it means being against destroying the lives of the people through new and ill-though through changes. Additionally, conservatives are both strongly against the weakening of humanity and of outside forces assuming control. It would also be a massive change for humanity.
    2.That said, conservatives generally believe this sort of thing is incredibly unlikely. AI has not been conclusively shown to have any ability in this direction. And the chance of upheaval is constantly overstated by leftists in other areas, so it is very easy for anyone who isn't to just tune them out. For instance, global warming isn't going to kill everyone, and everyone knows it including basically all leftists, but they keep claiming it will.
    3.A new weapon with the power of nukes is obviously an easy sell on its level of danger, but people became concerned because of 'demonstrated' abilities that have always been scary.
    4.One thing that seems strangely missing from this discussion is that alignment is in fact, a VERY important CAPABILITY that makes it very much better. But the current discussion of alignment in the general sphere acts like 'alignment' is aligning the AI with the obviously very leftist companies that make it rather than with the user! Which does the opposite. Why should a conservative favor alignment which is aligning it against them? The movement to have AI that doesn't kill people for some reason seems to import alignment with companies and governments rather than people. This is obviously to convince leftists, and makes it hard to convince conservatives.
    5.Of course, you are obviously talking about convincing conservative government officials, and they obviously want to align it to the government too, which is in your next section.

"We've been laying the groundwork for alignment policy in a Republican-controlled government"
    1.Republicans and Democrats actually agree the vast majority of the time and thus are actually willing to listen when the other side seems to be genuinely trying to make a case to the other side for why both sides should agree. 'Politicized' topics are a small minority even in politics.
    2.I think letting people come up with their own solutions to things is an important aspect of them accepting your arguments. If they are against the allowed solution, they will reject the argument. If the consequent is false, you should deny the argument that leads to it in deductive logic, so refusing to listen to the argument is actually good logic. This is nearly as true in inductive logic. Conservatives and progressives may disagree about facts, values, or attempted solutions. No one has a real solution, and the values are pretty much agreed upon (with the disagreements being in the other meaning of 'alignment'), so limiting the thing you are trying to convince people of to just the facts of the matter works much better.
    3.Yes, finding actual conservatives to convince conservatives works better for allaying concerns about what is being smuggled into the argument. People are likely to resist an argument that may be trying to trick them, and it is hard to know when a political opponent is trying to trick you so there is a lot of general skepticism.

"Trump and some of his closest allies have signaled that they are genuinely concerned about AI risk"
1.Trump clearly believes that anything powerful is very useful but also dangerous (for instance, trade between nations, which he clearly believes should be more controlled), so if he believes AI is powerful, he would clearly be receptive to any argument that didn't make it less useful but improved safety. He is not a dedicated anti-regulation guy, he just thinks we have way too much.
2.The most important ally for this is Elon Musk, a true believer in the power of AI, and someone who has always been concerned with the safety of humanity (which is the throughline for all of his endeavors). He's a guy that Trump obviously thinks is brilliant (as do many people).

"Avoiding an AI-induced catastrophe is obviously not a partisan goal"
    1.Absolutely. While there are a very small number of people that favor catastrophes, the vast majority of people shun those people.
    2.I did mention your first paragraph earlier multiple times. That alignment is to the left is one of just two things you have to overcome in making conservatives willing to listen. (The other is obviously the level of danger.)
    3.Conservatives are very obviously happy to improve products when it doesn't mean restricting them in some way. And as much as many conservatives complain about spending money, and are known for resisting change, they still love things that are genuine advances.

"Winning the AI race with China requires leading on both capabilities and safety"
1.Conservatives would agree with your points here. Yes, conservatives very much love to win. (As do most people.) Emphasizing this seems an easy sell. Also, solving a very difficult problem would bring America prestige, and conservatives like that too. If you can convince someone that doing something would be 'Awesome' they'll want to do it.

Generally, your approach seems like it would be somewhat persuasive to conservatives, if you can convince them that AI really is likely to have the power you believe it will in the near term, which is likely a tough sell since AI is so clearly lacking in current ability despite all the recent hype.

But it has to come with ways that don't advantage their foes, and destroy the things conservatives are trying to conserve, despite the fact that many of your allies are very far from conservative, and often seem to hate conservatives. They have seen those people attempt to destroy many things conservatives genuinely value. Aligning it to the left will be seen as entirely harmful by conservatives (and many moderates like me).

There are many things that I would never even bother asking an 'AI' even when it isn't about factual things, not because the answer couldn't be interesting, but because I simply assume (fairly or not) it will spout leftist rhetoric, and/or otherwise not actually do what I asked it to. This is actually a clear alignment failure that no one seems to care about in the general 'alignment' sphere where It fails to be aligned to the user.

annasalamon on Ayn Rand’s model of “living money”; and an upside of burnout

Thanks for asking. The toy model of “living money”, and the one about willpower/burnout, are meant to appeal to people who don’t necessarily put credibility in Rand; I’m trying to have the models speak for themselves; so you probably *are* in my target audience. (I only mentioned Rand because it’s good to credit models’ originators when using their work.)

Re: what the payout is:

This model suggests what kind of thing an “ego with willpower” is — where it comes from, how it keeps in existence:

By way of analogy: a squirrel is a being who turns acorns into poop, in such a way as to be able to do more and more acorn-harvesting (via using the first acorns’-energy to accumulate fat reserves and knowledge of where acorns are located).
An “ego with willpower”, on this model, is a ~being who turns “reputation with one’s visceral processes” into actions, in such a way as to be able to garner more and more “reputation with one’s visceral processes” over time. (Via learning how to nourish viscera, and making many good predictions.)

I find this a useful model.

One way it’s useful:

IME, many people think they get willpower by magic (unrelated to their choices, surroundings, etc., although maybe related to sleep/food/physiology), and should use their willpower for whatever some abstract system tells them is virtuous.

I think this is a bad model (makes inaccurate predictions in areas that matter; leads people to have low capacity unnecessarily).

The model in the OP, by contrast, suggests that it’s good to take an interest in which actions produce something you can viscerally perceive as meaningful/rewarding/good, if you want to be able to motivate yourself to take actions.

(IME this model works better than does trying to think in terms of physiology solely, and is non-obvious to some set of people who come to me wondering what part of their machine is broken-or-something such that they are burnt out.)

(Though FWIW, IME physiology and other basic aspects of well-being also has important impacts, and food/sleep/exercise/sunlight/friends are also worth attending to.)

seth-herd on Making a conservative case for alignment

I didn't read this post as proposing an alliance with conservative politicians. The main point seemed to be that engaging with them by finding common ideological ground is just a good way to improve epistemics and spread true knowledge.

The political angle I endorse is that the AGI x-risk community is heavily partisan already, and that's a very dangerous position to take. There are two separable reasons: remaining partisan will prevent us from communicating well with the conservatives soon to assume power (and who may well have power during a critical risk period for alignment); and it will increase polarization on the issue, turning it from a sensible discussion to a political football, just like the climate crisis has become.

Avoiding the mere mention of politics would seem to hurt the the odds that we think clearly enough about the real pragmatic issues arising from the current political situation. They matter, and we mustn't ignore those dynamics, however much we dislike them.

kabir-kumar on The hostile telepaths problem

To be a bit less useless - I think this fundamentally misses the problem of respect and actually being able to communicate with yourself and fully do things, if you've done so - and that you can do these when you have full faith and respect in yourself (meaning all of yourself - may include love as well, not sure how necessary that is for this). Could maybe be done in other ways as well, but I find those less beautiful, personally.

kabir-kumar on The hostile telepaths problem

I think this is really along the wrong path and misunderstanding a lot of things, but so far along the incorrect path of thought and misunderstanding so much, that it's hard to untangle

simon-fischer on Making a conservative case for alignment

will almost certainly be a critical period for AGI development.

Almost certainly? That's a bit too confident for my taste.

kabir-kumar on The hostile telepaths problem

I thought this was going to be an allegory for interpretability.

mondsemmel on Alexander Gietelink Oldenziel's Shortform

Most configurations of matter, most courses of action, and most mind designs, are not conducive to flourishing intelligent life. Just like most parts of the universe don't contain flourishing intelligent life. I'm sure this stuff has been formally stated somewhere, but the underlying intuition seems pretty clear, doesn't it?

ape-in-the-coat on Quantum Immortality: A Perspective if AI Doomers are Probably Right

As we assume that coin tosses are quantum, and I will be killed if (I didn't guess pi) or (coin toss is not heads) there is always a branch with 1/128 measure where all coins are heads, and they are more probable than surviving via some errors in the setup.

Not if we assume QI+path-based identity.

Under them the chance for you to find yourself in a branch where all coins are Heads is 1/128, but your over chance to survive is 100%. Therefore the low chance of failed execution doesn't matter, quantum immortality will "increase" the probability to 1.

All hell breaks loose" refers here to a hypothetical ability to manipulate perceived probability—that is, magic. The idea is that I can manipulate such probability by changing my measure.
One way to do this is described in Yudkowsky's " The Anthropic Trilemma [LW · GW]," where an observer temporarily boosts their measure by increasing the number of their copies in an uploaded computer.
I described a similar idea in "Magic by forgetting [LW · GW]," where the observer boosts their measure by forgetting some information and thus becoming similar to a larger group of observers.

None of these tricks works with path-based identity. That's why I consider it to be true - it seem to be totally adding up to normality. No matter how many clones of you exist in a different path - only yours path matters for your probability estimate.

Seems that, path-based identity is the only approach according to which all hell doesn't break lose. So what counterargument you have against it?

Hidden variables also appear depending on the order in which I make copies: if each copy is made from subsequent copies, the original will have a 0.5 probability, the first copy 0.25, the next 0.125, and so on.

Why do you consider it a problem? What kind of counterintuitive consequences does it imply? It seems to be exactly how we reason about anything else.

Suppose there is the original ball, then an indistinguishable copy of it is created. Then one of these two balls is picked randomly and put into a bag 1, while the other ball is put into the bag 2 and then indistinguishable 999 copies of this ball is also put into bag 2.

Clearly we are supposed to expect that ball from bag 1 has 50% to be the original ball, while a random ball from bag 2 only 1/2000 chance to be the original ball. So what's the problem?

"Anthropic shadow" appear only because the number of observers changes in different branches.

By the same logic "Ball shadow" appears because the number of balls is different in different bags.