LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

story-based decision-making
bhauth · 2024-02-07T02:35:27.286Z · comments (11)

Partial value takeover without world takeover
KatjaGrace · 2024-04-05T06:20:03.961Z · comments (23)

Public Call for Interest in Mathematical Alignment
Davidmanheim · 2023-11-22T13:22:09.558Z · comments (9)

Based Beff Jezos and the Accelerationists
Zvi · 2023-12-06T16:00:08.380Z · comments (29)

Stagewise Development in Neural Networks
Jesse Hoogland (jhoogland) · 2024-03-20T19:54:06.181Z · comments (1)

Singular learning theory: exercises
Zach Furman (zfurman) · 2024-08-30T20:00:03.785Z · comments (5)

[link] Debating with More Persuasive LLMs Leads to More Truthful Answers
Akbir Khan (akbir-khan) · 2024-02-07T21:28:10.694Z · comments (14)

On the abolition of man
Joe Carlsmith (joekc) · 2024-01-18T18:17:06.201Z · comments (18)

Three Notions of "Power"
johnswentworth · 2024-10-30T06:10:08.326Z · comments (43)

Teaching CS During Take-Off
andrew carle (andrew-carle) · 2024-05-14T22:45:39.447Z · comments (13)

[link] Executable philosophy as a failed totalizing meta-worldview
jessicata (jessica.liu.taylor) · 2024-09-04T22:50:18.294Z · comments (40)

Covert Malicious Finetuning
Tony Wang (tw) · 2024-07-02T02:41:51.698Z · comments (4)

[link] Detecting Genetically Engineered Viruses With Metagenomic Sequencing
jefftk (jkaufman) · 2024-06-27T14:01:34.868Z · comments (10)

Research update: Towards a Law of Iterated Expectations for Heuristic Estimators
Eric Neyman (UnexpectedValues) · 2024-10-07T19:29:29.033Z · comments (2)

Natural Latents: The Concepts
johnswentworth · 2024-03-20T18:21:19.878Z · comments (18)

We might be missing some key feature of AI takeoff; it'll probably seem like "we could've seen this coming"
Lukas_Gloor · 2024-05-09T15:43:11.490Z · comments (36)

LLMs Look Increasingly Like General Reasoners
eggsyntax · 2024-11-08T23:47:28.886Z · comments (40)

How well do truth probes generalise?
mishajw · 2024-02-24T14:12:19.729Z · comments (11)

[link] Seven lessons I didn't learn from election day
Eric Neyman (UnexpectedValues) · 2024-11-14T18:39:07.053Z · comments (31)

Solving adversarial attacks in computer vision as a baby version of general AI alignment
Stanislav Fort (stanislavfort) · 2024-08-29T17:17:47.136Z · comments (8)

[link] Re: Anthropic's suggested SB-1047 amendments
RobertM (T3t) · 2024-07-27T22:32:39.447Z · comments (13)

[link] More Hyphenation
Arjun Panickssery (arjun-panickssery) · 2024-02-07T19:43:29.086Z · comments (19)

I'm a bit skeptical of AlphaFold 3
Oleg Trott (oleg-trott) · 2024-06-25T00:04:41.274Z · comments (14)

[link] Self-Help Corner: Loop Detection
adamShimi · 2024-10-02T08:33:23.487Z · comments (6)

A Crisper Explanation of Simulacrum Levels
Thane Ruthenis · 2023-12-23T22:13:52.286Z · comments (13)

Apply to be a Safety Engineer at Lockheed Martin!
yanni kyriacos (yanni) · 2024-03-31T21:02:08.499Z · comments (3)

There is a globe in your LLM
jacob_drori (jacobcd52) · 2024-10-08T00:43:40.300Z · comments (4)

GPT-o1
Zvi · 2024-09-16T13:40:06.236Z · comments (34)

The Aspiring Rationalist Congregation
maia · 2024-01-10T22:52:54.298Z · comments (23)

[Intuitive self-models] 1. Preliminaries
Steven Byrnes (steve2152) · 2024-09-19T13:45:27.976Z · comments (20)

OpenAI: Helen Toner Speaks
Zvi · 2024-05-30T21:10:02.938Z · comments (8)

Rejecting Television
Declan Molony (declan-molony) · 2024-04-23T04:59:50.253Z · comments (10)

Fluent, Cruxy Predictions
Raemon · 2024-07-10T18:00:06.424Z · comments (14)

5 homegrown EA projects, seeking small donors
Austin Chen (austin-chen) · 2024-10-28T23:24:25.745Z · comments (4)

[question] What are the best arguments for/against AIs being "slightly 'nice'"?
Raemon · 2024-09-24T02:00:19.605Z · answers+comments (51)

Reflections on Less Online
Error · 2024-07-07T03:49:44.534Z · comments (15)

Addressing Feature Suppression in SAEs
Benjamin Wright (Benw8888) · 2024-02-16T18:32:51.927Z · comments (3)

Scalable oversight as a quantitative rather than qualitative problem
Buck · 2024-07-06T17:42:41.325Z · comments (11)

[Valence series] 2. Valence & Normativity
Steven Byrnes (steve2152) · 2023-12-07T16:43:49.919Z · comments (5)

A simple case for extreme inner misalignment
Richard_Ngo (ricraz) · 2024-07-13T15:40:37.518Z · comments (41)

[link] Environmentalism in the United States Is Unusually Partisan
Jeffrey Heninger (jeffrey-heninger) · 2024-05-13T21:23:10.755Z · comments (26)

[link] Anxiety vs. Depression
Sable · 2024-03-17T00:15:08.255Z · comments (35)

[link] Nietzsche's Morality in Plain English
Arjun Panickssery (arjun-panickssery) · 2023-12-04T00:57:42.839Z · comments (13)

Newsom Vetoes SB 1047
Zvi · 2024-10-01T12:20:06.127Z · comments (6)

[link] [Paper] Stress-testing capability elicitation with password-locked models
Fabien Roger (Fabien) · 2024-06-04T14:52:50.204Z · comments (10)

[link] A Universal Emergent Decomposition of Retrieval Tasks in Language Models
Alexandre Variengien (alexandre-variengien) · 2023-12-19T11:52:27.354Z · comments (3)

MATS Winter 2023-24 Retrospective
utilistrutil · 2024-05-11T00:09:17.059Z · comments (28)

Some for-profit AI alignment org ideas
Eric Ho (eh42) · 2023-12-14T14:23:20.654Z · comments (19)

[link] Hardshipification
Jonathan Moregård (JonathanMoregard) · 2024-05-28T20:02:29.709Z · comments (17)

[link] "AI Safety for Fleshy Humans" an AI Safety explainer by Nicky Case
habryka (habryka4) · 2024-05-03T18:10:12.478Z · comments (10)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

cata on OpenAI Email Archives (from Musk v. Altman)

Thanks for not only doing this but noting the accuracy of the unchecked transcript, it's always hard work to build a mental model of how good LLM tools are at what stuff.

sam-marks on Lao Mein's Shortform

I'm quite happy for laws to be passed and enforced via the normal mechanisms. But I think it's bad for policy and enforcement to be determined by Elon Musk's personal vendettas. If Elon tried to defund the AI safety institute because of a personal vendetta against AI safety researchers, I would have some process concerns, and so I also have process concerns when these vendettas are directed against OAI.

seth-herd on OpenAI Email Archives (from Musk v. Altman)

Rings true. I'm not sure it pushes me much on the ethics of OpenAI; somebody else had a good idea for a philosophy and a name to push for AI in a certain (maybe dumb) direction; they recognized it as a good idea and appropriated it for their own similar project. Should they have used a more different name? Probably. Should they have used a more different philosophical argument? No. Should they have brought Guy Ravine on board? Probably not; his vision for how the thing would actually go was very different from theirs, and none of his skills were really that relevant. He'd have been in arguments with them from the start.

Is this the right way for industry to work? Nope. But nobody knows how to properly give credit for good but broad ideas.

None of this is to endorse anything or anyone related to OpenAI, just to say it's pretty standard practice.

annasalamon on Ayn Rand’s model of “living money”; and an upside of burnout

Yes, this is a good point, relates to why I claimed at top that this is an oversimplified model. I appreciate you using logic from my stated premises; helps things be falsifiable.

It seems to me:

Somehow people who are in good physical health wake up each day with a certain amount of restored willpower. (This is inconsistent with the toy model in the OP, but is still my real / more-complicated model.)
Noticing spontaneously-interesting things can be done without willpower; but carefully noticing superficially-boring details and taking notes in hopes of later payoff indeed requires willpower, on my model. (Though, for me, less than e.g. going jogging requires.)
If you’ve just been defeated by a force you weren’t tracking, that force often becomes spontaneously-interesting. Thus people who are burnt out can sometimes take a spontaneous interest in how willpower/burnout/visceral motivation works, and can enjoy “learning humbly” from these things.
There’s a way burnout can help cut through ~dumb/dissociated/overconfident ideological frameworks (e.g. “only AI risk is interesting/relevant to anything”), and make space for other information to have attention again, and make it possible to learn things not in one's model. Sort of like removing a monopoly business from a given sector, so that other thingies have a shot again.

I wish the above was more coherent/model-y.

myles-h on What are Emotions?

There is no such thing as "inherent value"

Does this also mean there is no such thing as "inherent good"? If so, then one cannot say, "X is good", they would have to say "I think that X is good", for "good" would be a fact of their mind, not the environment.

This is what I thought the whole field of morality is about. Defining what is "good" in an objective fundamental sense.

And if "inherent good" can exist but not "inherent value", how would "good" be defined for it wouldn't be allowed to use "value" in its definition.

kabir-kumar on The Online Sports Gambling Experiment Has Failed

People Cannot Handle Gambling on Smartphones

this seems a very strange way to say "Smartphone Gambling is Unhealthy"
It's like saying "People's Lungs Cannot Handle Cigarettes"

deepthoughtlife on Making a conservative case for alignment

As a (severe) skeptic of all the AI doom stuff and a moderate/centrist that has been voting for conservatives I decided my perspective on this might be useful here (which obviously skews heavily left). (While my response is in order, the numbers are there to separate my points, not to give which paragraph I am responding to.)

"AI-not-disempowering-humanity is conservative in the most fundamental sense"
    1.Well, obviously this title section is completely true. If conservative means anything, it means being against destroying the lives of the people through new and ill-though through changes. Additionally, conservatives are both strongly against the weakening of humanity and of outside forces assuming control. It would also be a massive change for humanity.
    2.That said, conservatives generally believe this sort of thing is incredibly unlikely. AI has not been conclusively shown to have any ability in this direction. And the chance of upheaval is constantly overstated by leftists in other areas, so it is very easy for anyone who isn't to just tune them out. For instance, global warming isn't going to kill everyone, and everyone knows it including basically all leftists, but they keep claiming it will.
    3.A new weapon with the power of nukes is obviously an easy sell on its level of danger, but people became concerned because of 'demonstrated' abilities that have always been scary.
    4.One thing that seems strangely missing from this discussion is that alignment is in fact, a VERY important CAPABILITY that makes it very much better. But the current discussion of alignment in the general sphere acts like 'alignment' is aligning the AI with the obviously very leftist companies that make it rather than with the user! Which does the opposite. Why should a conservative favor alignment which is aligning it against them? The movement to have AI that doesn't kill people for some reason seems to import alignment with companies and governments rather than people. This is obviously to convince leftists, and makes it hard to convince conservatives.
    5.Of course, you are obviously talking about convincing conservative government officials, and they obviously want to align it to the government too, which is in your next section.

"We've been laying the groundwork for alignment policy in a Republican-controlled government"
    1.Republicans and Democrats actually agree the vast majority of the time and thus are actually willing to listen when the other side seems to be genuinely trying to make a case to the other side for why both sides should agree. 'Politicized' topics are a small minority even in politics.
    2.I think letting people come up with their own solutions to things is an important aspect of them accepting your arguments. If they are against the allowed solution, they will reject the argument. If the consequent is false, you should deny the argument that leads to it in deductive logic, so refusing to listen to the argument is actually good logic. This is nearly as true in inductive logic. Conservatives and progressives may disagree about facts, values, or attempted solutions. No one has a real solution, and the values are pretty much agreed upon (with the disagreements being in the other meaning of 'alignment'), so limiting the thing you are trying to convince people of to just the facts of the matter works much better.
    3.Yes, finding actual conservatives to convince conservatives works better for allaying concerns about what is being smuggled into the argument. People are likely to resist an argument that may be trying to trick them, and it is hard to know when a political opponent is trying to trick you so there is a lot of general skepticism.

"Trump and some of his closest allies have signaled that they are genuinely concerned about AI risk"
1.Trump clearly believes that anything powerful is very useful but also dangerous (for instance, trade between nations, which he clearly believes should be more controlled), so if he believes AI is powerful, he would clearly be receptive to any argument that didn't make it less useful but improved safety. He is not a dedicated anti-regulation guy, he just thinks we have way too much.
2.The most important ally for this is Elon Musk, a true believer in the power of AI, and someone who has always been concerned with the safety of humanity (which is the throughline for all of his endeavors). He's a guy that Trump obviously thinks is brilliant (as do many people).

"Avoiding an AI-induced catastrophe is obviously not a partisan goal"
    1.Absolutely. While there are a very small number of people that favor catastrophes, the vast majority of people shun those people.
    2.I did mention your first paragraph earlier multiple times. That alignment is to the left is one of just two things you have to overcome in making conservatives willing to listen. (The other is obviously the level of danger.)
    3.Conservatives are very obviously happy to improve products when it doesn't mean restricting them in some way. And as much as many conservatives complain about spending money, and are known for resisting change, they still love things that are genuine advances.

"Winning the AI race with China requires leading on both capabilities and safety"
1.Conservatives would agree with your points here. Yes, conservatives very much love to win. (As do most people.) Emphasizing this seems an easy sell. Also, solving a very difficult problem would bring America prestige, and conservatives like that too. If you can convince someone that doing something would be 'Awesome' they'll want to do it.

Generally, your approach seems like it would be somewhat persuasive to conservatives, if you can convince them that AI really is likely to have the power you believe it will in the near term, which is likely a tough sell since AI is so clearly lacking in current ability despite all the recent hype.

But it has to come with ways that don't advantage their foes, and destroy the things conservatives are trying to conserve, despite the fact that many of your allies are very far from conservative, and often seem to hate conservatives. They have seen those people attempt to destroy many things conservatives genuinely value. Aligning it to the left will be seen as entirely harmful by conservatives (and many moderates like me).

There are many things that I would never even bother asking an 'AI' even when it isn't about factual things, not because the answer couldn't be interesting, but because I simply assume (fairly or not) it will spout leftist rhetoric, and/or otherwise not actually do what I asked it to. This is actually a clear alignment failure that no one seems to care about in the general 'alignment' sphere where It fails to be aligned to the user.

annasalamon on Ayn Rand’s model of “living money”; and an upside of burnout

Thanks for asking. The toy model of “living money”, and the one about willpower/burnout, are meant to appeal to people who don’t necessarily put credibility in Rand; I’m trying to have the models speak for themselves; so you probably *are* in my target audience. (I only mentioned Rand because it’s good to credit models’ originators when using their work.)

Re: what the payout is:

This model suggests what kind of thing an “ego with willpower” is — where it comes from, how it keeps in existence:

By way of analogy: a squirrel is a being who turns acorns into poop, in such a way as to be able to do more and more acorn-harvesting (via using the first acorns’-energy to accumulate fat reserves and knowledge of where acorns are located).
An “ego with willpower”, on this model, is a ~being who turns “reputation with one’s visceral processes” into actions, in such a way as to be able to garner more and more “reputation with one’s visceral processes” over time. (Via learning how to nourish viscera, and making many good predictions.)

I find this a useful model.

One way it’s useful:

IME, many people think they get willpower by magic (unrelated to their choices, surroundings, etc., although maybe related to sleep/food/physiology), and should use their willpower for whatever some abstract system tells them is virtuous.

I think this is a bad model (makes inaccurate predictions in areas that matter; leads people to have low capacity unnecessarily).

The model in the OP, by contrast, suggests that it’s good to take an interest in which actions produce something you can viscerally perceive as meaningful/rewarding/good, if you want to be able to motivate yourself to take actions.

(IME this model works better than does trying to think in terms of physiology solely, and is non-obvious to some set of people who come to me wondering what part of their machine is broken-or-something such that they are burnt out.)

(Though FWIW, IME physiology and other basic aspects of well-being also has important impacts, and food/sleep/exercise/sunlight/friends are also worth attending to.)

seth-herd on Making a conservative case for alignment

I didn't read this post as proposing an alliance with conservative politicians. The main point seemed to be that engaging with them by finding common ideological ground is just a good way to improve epistemics and spread true knowledge.

The political angle I endorse is that the AGI x-risk community is heavily partisan already, and that's a very dangerous position to take. There are two separable reasons: remaining partisan will prevent us from communicating well with the conservatives soon to assume power (and who may well have power during a critical risk period for alignment); and it will increase polarization on the issue, turning it from a sensible discussion to a political football, just like the climate crisis has become.

Avoiding the mere mention of politics would seem to hurt the the odds that we think clearly enough about the real pragmatic issues arising from the current political situation. They matter, and we mustn't ignore those dynamics, however much we dislike them.

kabir-kumar on The hostile telepaths problem

To be a bit less useless - I think this fundamentally misses the problem of respect and actually being able to communicate with yourself and fully do things, if you've done so - and that you can do these when you have full faith and respect in yourself (meaning all of yourself - may include love as well, not sure how necessary that is for this). Could maybe be done in other ways as well, but I find those less beautiful, personally.