LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

RLHF is the worst possible thing done when facing the alignment problem
tailcalled · 2024-09-19T18:56:27.676Z · comments (10)

A short dialogue on comparability of values
cousin_it · 2023-12-20T14:08:29.650Z · comments (7)

[link] [Linkpost] Concept Alignment as a Prerequisite for Value Alignment
Bogdan Ionut Cirstea (bogdan-ionut-cirstea) · 2023-11-04T17:34:36.563Z · comments (0)

[link] Lying is Cowardice, not Strategy
Connor Leahy (NPCollapse) · 2023-10-24T13:24:25.450Z · comments (73)

When and why should you use the Kelly criterion?
Garrett Baker (D0TheMath) · 2023-11-05T23:26:38.952Z · comments (25)

[question] Supposing the 1bit LLM paper pans out
O O (o-o) · 2024-02-29T05:31:24.158Z · answers+comments (11)

How to develop a photographic memory 2/3
PhilosophicalSoul (LiamLaw) · 2023-12-30T20:18:14.255Z · comments (7)

EA Infrastructure Fund's Plan to Focus on Principles-First EA
Linch · 2023-12-06T03:24:55.844Z · comments (0)

Uncertainty in all its flavours
Cleo Nardo (strawberry calm) · 2024-01-09T16:21:07.915Z · comments (6)

An Affordable CO2 Monitor
Pretentious Penguin (dylan-mahoney) · 2024-03-21T03:06:53.255Z · comments (1)

Without Fundamental Advances, Rebellion and Coup d'État are the Inevitable Outcomes of Dictators & Monarchs Trying to Control Large, Capable Countries
Roko · 2024-01-31T10:14:02.042Z · comments (34)

Housing Roundup #6
Zvi · 2023-09-20T13:10:01.443Z · comments (8)

Survey on the acceleration risks of our new RFPs to study LLM capabilities
Ajeya Cotra (ajeya-cotra) · 2023-11-10T23:59:52.515Z · comments (1)

Sparse autoencoders find composed features in small toy models
Evan Anders (evan-anders) · 2024-03-14T18:00:43.339Z · comments (12)

Fifteen Lawsuits against OpenAI
Remmelt (remmelt-ellen) · 2024-03-09T12:22:09.715Z · comments (4)

[link] align your latent spaces
bhauth · 2023-12-24T16:30:09.138Z · comments (8)

Can the House Legislate?
jefftk (jkaufman) · 2023-10-05T13:40:06.649Z · comments (6)

Steering subsystems: capabilities, agency, and alignment
Seth Herd · 2023-09-29T13:45:00.739Z · comments (0)

Consequentialism is a compass, not a judge
Neil (neil-warren) · 2024-04-13T10:47:44.980Z · comments (6)

Meetup In a Box: Year In Review
Czynski (JacobKopczynski) · 2024-02-14T01:18:28.259Z · comments (0)

Agent membranes/boundaries and formalizing “safety”
Chipmonk · 2024-01-03T17:55:21.018Z · comments (46)

The Overkill Conspiracy Hypothesis
ymeskhout · 2023-10-20T16:51:20.308Z · comments (8)

Quick takes on "AI is easy to control"
So8res · 2023-12-02T22:31:45.683Z · comments (49)

A list of all the deadlines in Biden's Executive Order on AI
Valentin Baltadzhiev (valentin-baltadzhiev) · 2023-11-01T17:14:31.074Z · comments (2)

Facebook is Paying Me to Post
jefftk (jkaufman) · 2023-11-14T19:10:07.303Z · comments (5)

Causality is Everywhere
silentbob · 2024-02-13T13:44:49.952Z · comments (12)

Exploring OpenAI's Latent Directions: Tests, Observations, and Poking Around
Johnny Lin (hijohnnylin) · 2024-01-31T06:01:27.969Z · comments (4)

Geometric Utilitarianism (And Why It Matters)
StrivingForLegibility · 2024-05-12T03:41:21.342Z · comments (2)

Singular learning theory and bridging from ML to brain emulations
kave · 2023-11-01T21:31:54.789Z · comments (16)

Reprograming the Mind: Meditation as a Tool for Cognitive Optimization
Jonas Hallgren · 2024-01-11T12:03:41.763Z · comments (3)

[link] How to Upload a Mind (In Three Not-So-Easy Steps)
aggliu · 2023-11-13T18:13:32.893Z · comments (0)

My Mid-Career Transition into Biosecurity
jefftk (jkaufman) · 2023-10-02T21:20:06.768Z · comments (4)

Essaying Other Plans
Screwtape · 2024-03-06T22:59:06.240Z · comments (4)

Vote in the LessWrong review! (LW 2022 Review voting phase)
habryka (habryka4) · 2024-01-17T07:22:17.921Z · comments (9)

AISC Project: Modelling Trajectories of Language Models
NickyP (Nicky) · 2023-11-13T14:33:56.407Z · comments (0)

The Limitations of GPT-4
p.b. · 2023-11-24T15:30:30.933Z · comments (12)

D&D.Sci Hypersphere Analysis Part 3: Beat it with Linear Algebra
aphyer · 2024-01-16T22:44:52.424Z · comments (1)

Just because an LLM said it doesn't mean it's true: an illustrative example
dirk (abandon) · 2024-08-21T21:05:59.691Z · comments (12)

[link] Can a Bayesian Oracle Prevent Harm from an Agent? (Bengio et al. 2024)
mattmacdermott · 2024-09-01T07:46:26.647Z · comments (0)

A New Class of Glitch Tokens - BPE Subtoken Artifacts (BSA)
Lao Mein (derpherpize) · 2024-09-20T13:13:26.181Z · comments (7)

[link] Positive visions for AI
L Rudolf L (LRudL) · 2024-07-23T20:15:26.064Z · comments (4)

Optimizing Repeated Correlations
SatvikBeri · 2024-08-01T17:33:23.823Z · comments (1)

LessWrong email subscriptions?
Raemon · 2024-08-27T21:59:56.855Z · comments (6)

The causal backbone conjecture
tailcalled · 2024-08-17T18:50:14.577Z · comments (0)

Three Types of Constraints in the Space of Agents
Nora_Ammann · 2024-01-15T17:27:27.560Z · comments (3)

[link] Emotional issues often have an immediate payoff
Chipmonk · 2024-06-10T23:39:40.697Z · comments (2)

Ideas for Next-Generation Writing Platforms, using LLMs
ozziegooen · 2024-06-04T18:40:24.636Z · comments (4)

Talk: AI safety fieldbuilding at MATS
Ryan Kidd (ryankidd44) · 2024-06-23T23:06:37.623Z · comments (2)

AI debate: test yourself against chess 'AIs'
Richard Willis · 2023-11-22T14:58:10.847Z · comments (35)

Losing Metaphors: Zip and Paste
jefftk (jkaufman) · 2023-11-29T20:31:07.464Z · comments (6)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

brendan-long on ASIs will not leave just a little sunlight for Earth

The argument using Bernard Arnault doesn't really work. He (probably) won't give you $77 because if he gave everyone $77, he'd spend a very large portion of his wealth. But we don't need an AI to give us billions of Earths. Just one would be sufficient. Bernard Arnault would probably be willing to spend $77 to prevent the extinction of a (non-threatening) alien species.

(This is not a general-purpose argument against worrying about AI, I just don't think this particular argument works)

lao-mein on ASIs will not leave just a little sunlight for Earth

This area could really use better economic analysis. It seems obvious to me that some subset of workers can be pushed below subsistence, at least locally (imagine farmers being unable to afford rent because mechanized cotton plantations can out-bid them for farmland). Surely there are conditions where this would be true for most humans.

There should be a simple one-sentence counter-argument to "Trade opportunities always increases population welfare", but I'm not sure what it is.

davekasten on davekasten's Shortform

Preregistering intent to write "Every Bay Area Walled Compound" (hat tip: Emma Liddell)

Unrelatedly, I am in Berkeley through Wednesday afternoon, let me know if you're around and would like to chat

thane-ruthenis on Another argument against utility-centric alignment paradigms

Eh, the way I phrased that statement, I'd actually meant that an AGI aligned to human values would also be a subject of AGI-doom arguments, in the sense that it'd exhibit instrumental convergence, power-seeking, et cetera. It wouldn't do that in the domains where that'd be at odds with its values – for example, in cases where that'd be violating human agency —but that's true of all other AGIs as well. (A paperclip-maximizer wouldn't erase its memory of what "a paperclip" is to free up space for combat plans.)

In particular, that statement certainly weren't intended as a claim that an aligned AGI is impossible. Just that its internal structure [LW · GW] would likely be that of an embedded agent, and that if the free parameter of its values were changed, it'd be an extinction threat.

thane-ruthenis on Another argument against utility-centric alignment paradigms

I agree that the agent-foundations research has been somewhat misaimed from the start, but I buy this explanation of John's [LW · GW] regarding where it went wrong and how to fix it. Basically, what we need to figure out is a theory of embedded world-modeling, which would capture the aspect of reality where the universe naturally decomposes [LW · GW] into hierarchically arranged sparsely interacting subsystems. Our agent would then be a perfect game-theoretic agent, but defined over that abstract (and lazy [LW(p) · GW(p)]) world-model, rather than over the world directly.

This would take care of agents needing to be "bigger" than the universe, counterfactuals, the "outside-view" problem, the realizability and the self-reference problems, the problem of hypothesis spaces, and basically everything else that's problematic about embedded agency.

thane-ruthenis on Another argument against utility-centric alignment paradigms

Do you endorse [the claim that any "generally intelligent system capable of autonomously optimizing the world the way humans can" would necessarily be well-approximated as a game-theoretic agent?]

Yes.

Because humans sure don't seem like paperclipper-style utility maximizers to me.

Humans are indeed hybrid systems. But I would say that inasmuch as they act as generally intelligent systems capable of autonomously optimizing the world in scarily powerful ways, they do act as game-theoretic agents. E. g., people who are solely focused on resource accumulation, and don't have self-destructive vices or any distracting values they're not willing to sacrifice to Moloch, tend to indeed accumulate power at a steady rate. At a smaller scope, people tend to succeed at those of their long-term goals that they've clarified for themselves and doggedly pursue; and not succeed at them if they flip-flop between different passions on a daily basis.

I've been meaning to do some sort of literature review solidly backing this claim, actually, but it hasn't been a priority for me. Hmm, maybe it'd be easy with the current AI tools...

sharmake-farah on A shot at the diamond-alignment problem

For my money, the nice properties that human and AI systems have that matter for alignment is IMO not the properties from Shard Theory, but rather several other properties that mattered:

Alignment generalizes further than capabilities because of verifying being easier to generate, as well as learning values being easier than having a lot of other real world capabilities.
It's looking like the values of humans are far, far simpler than a lot of evopsych literature and Yudkowsky thought, and related to this, values are less fragile than people thought 15-20 years ago, in the sense that values generalize far better OOD than people used to think 15-20 years ago.
The brain and DL AIs, while not the same thing, are doing reasonably similar things such that we can transport a lot of AI insights into neuroscience/human brain insights, and vice versa.
One of those lessons is the bitter lesson from Sutton applies to human values and morals, which cashes out into the fact that the data matter much more than the algorithm when predicting it's values, especially OOD generalization of values, and thus controlling the data is basically equivalent to controlling the values.

trevorone on Lighthaven Sequences Reading Group #3 (Tuesday 09/24)

This is an idea and NOT a recommendation. Unintended consequences abound.

Have you thought about sorting into groups based on carefully-selected categories? For example, econ/social sciences vs quant background with extra whiteboard space, a separate group for new arrivals who didn't do the readings from the other weeks (as their perspectives will have less overlap), a separate group for people who deliberately took a bunch of notes and made a concise list vs a more casual easygoing group, etc?

finalformal2 on Laziness death spirals

I'd be interested in reading much more about this. Energy and akrasia as it's popularly called here continue to be my biggest life challenges. High fiber diet seems to help, and high novelty seems to help.

elityre on Applications of Chaos: Saying No (with Hastings Greer)

Why does the video show up so tiny?