LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

[link] Contra Ngo et al. “Every ‘Every Bay Area House Party’ Bay Area House Party”
Ricki Heicklen (bayesshammai) · 2024-02-22T23:56:02.318Z · comments (5)

Thinking By The Clock
Screwtape · 2023-11-08T07:40:59.936Z · comments (27)

Self-Other Overlap: A Neglected Approach to AI Alignment
Marc Carauleanu (Marc-Everin Carauleanu) · 2024-07-30T16:22:29.561Z · comments (40)

How to Catch an AI Liar: Lie Detection in Black-Box LLMs by Asking Unrelated Questions
JanB (JanBrauner) · 2023-09-28T18:53:58.896Z · comments (38)

Optimistic Assumptions, Longterm Planning, and "Cope"
Raemon · 2024-07-17T22:14:24.090Z · comments (45)

[link] Daniel Kahneman has died
DanielFilan · 2024-03-27T15:59:14.517Z · comments (11)

Why Would Belief-States Have A Fractal Structure, And Why Would That Matter For Interpretability? An Explainer
johnswentworth · 2024-04-18T00:27:43.451Z · comments (21)

AI as a science, and three obstacles to alignment strategies
So8res · 2023-10-25T21:00:16.003Z · comments (80)

Humming is not a free $100 bill
Elizabeth (pktechgirl) · 2024-06-06T20:10:02.457Z · comments (6)

Introducing Alignment Stress-Testing at Anthropic
evhub · 2024-01-12T23:51:25.875Z · comments (23)

The other side of the tidal wave
KatjaGrace · 2023-11-03T05:40:05.363Z · comments (85)

There should be more AI safety orgs
Marius Hobbhahn (marius-hobbhahn) · 2023-09-21T14:53:52.779Z · comments (25)

OMMC Announces RIP
Adam Scholl (adam_scholl) · 2024-04-01T23:20:00.433Z · comments (5)

Safety consultations for AI lab employees
Zach Stein-Perlman · 2024-07-27T15:00:27.276Z · comments (4)

"Humanity vs. AGI" Will Never Look Like "Humanity vs. AGI" to Humanity
Thane Ruthenis · 2023-12-16T20:08:39.375Z · comments (34)

re: Yudkowsky on biological materials
bhauth · 2023-12-11T13:28:10.639Z · comments (30)

[link] Toward a Broader Conception of Adverse Selection
Ricki Heicklen (bayesshammai) · 2024-03-14T22:40:57.920Z · comments (61)

Every "Every Bay Area House Party" Bay Area House Party
Richard_Ngo (ricraz) · 2024-02-16T18:53:28.567Z · comments (6)

[link] FHI (Future of Humanity Institute) has shut down (2005–2024)
gwern · 2024-04-17T13:54:16.791Z · comments (22)

Effective Aspersions: How the Nonlinear Investigation Went Wrong
TracingWoodgrains (tracingwoodgrains) · 2023-12-19T12:00:23.529Z · comments (170)

WTH is Cerebrolysin, actually?
gsfitzgerald (neuroplume) · 2024-08-06T20:40:53.378Z · comments (22)

Architects of Our Own Demise: We Should Stop Developing AI Carelessly
Roko · 2023-10-26T00:36:05.126Z · comments (75)

Critical review of Christiano's disagreements with Yudkowsky
Vanessa Kosoy (vanessa-kosoy) · 2023-12-27T16:02:50.499Z · comments (40)

Timaeus's First Four Months
Jesse Hoogland (jhoogland) · 2024-02-28T17:01:53.437Z · comments (6)

'Empiricism!' as Anti-Epistemology
Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2024-03-14T02:02:59.723Z · comments (86)

[link] President Biden Issues Executive Order on Safe, Secure, and Trustworthy Artificial Intelligence
Tristan Williams (tristan-williams) · 2023-10-30T11:15:38.422Z · comments (39)

Thomas Kwa's MIRI research experience
Thomas Kwa (thomas-kwa) · 2023-10-02T16:42:37.886Z · comments (53)

2023 Unofficial LessWrong Census/Survey
Screwtape · 2023-12-02T04:41:51.418Z · comments (81)

Thoughts on the AI Safety Summit company policy requests and responses
So8res · 2023-10-31T23:54:09.566Z · comments (14)

Evaluating the historical value misspecification argument
Matthew Barnett (matthew-barnett) · 2023-10-05T18:34:15.695Z · comments (142)

Reconsider the anti-cavity bacteria if you are Asian
Lao Mein (derpherpize) · 2024-04-15T07:02:02.655Z · comments (43)

[link] Recommendation: reports on the search for missing hiker Bill Ewasko
eukaryote · 2024-07-31T22:15:03.174Z · comments (28)

The 'Neglected Approaches' Approach: AE Studio's Alignment Agenda
Cameron Berg (cameron-berg) · 2023-12-18T20:35:01.569Z · comments (21)

This is already your second chance
Malmesbury (Elmer of Malmesbury) · 2024-07-28T17:13:57.680Z · comments (13)

RSPs are pauses done right
evhub · 2023-10-14T04:06:02.709Z · comments (70)

[link] The King and the Golem
Richard_Ngo (ricraz) · 2023-09-25T19:51:22.980Z · comments (16)

How useful is mechanistic interpretability?
ryan_greenblatt · 2023-12-01T02:54:53.488Z · comments (54)

Many arguments for AI x-risk are wrong
TurnTrout · 2024-03-05T02:31:00.990Z · comments (86)

Announcing ILIAD — Theoretical AI Alignment Conference
Nora_Ammann · 2024-06-05T09:37:39.546Z · comments (18)

Without fundamental advances, misalignment and catastrophe are the default outcomes of training powerful AI
Jeremy Gillen (jeremy-gillen) · 2024-01-26T07:22:06.370Z · comments (60)

The likely first longevity drug is based on sketchy science. This is bad for science and bad for longevity.
BobBurgers · 2023-12-12T02:42:18.559Z · comments (34)

My thoughts on the social response to AI risk
Matthew Barnett (matthew-barnett) · 2023-11-01T21:17:08.184Z · comments (37)

You can remove GPT2’s LayerNorm by fine-tuning for an hour
StefanHex (Stefan42) · 2024-08-08T18:33:38.803Z · comments (11)

[link] Sycophancy to subterfuge: Investigating reward tampering in large language models
Carson Denison (carson-denison) · 2024-06-17T18:41:31.090Z · comments (22)

[link] Boycott OpenAI
PeterMcCluskey · 2024-06-18T19:52:42.854Z · comments (26)

[link] Connecting the Dots: LLMs can Infer & Verbalize Latent Structure from Training Data
Johannes Treutlein (Johannes_Treutlein) · 2024-06-21T15:54:41.430Z · comments (13)

And All the Shoggoths Merely Players
Zack_M_Davis · 2024-02-10T19:56:59.513Z · comments (57)

DeepMind's "Frontier Safety Framework" is weak and unambitious
Zach Stein-Perlman · 2024-05-18T03:00:13.541Z · comments (14)

Vote on Interesting Disagreements
Ben Pace (Benito) · 2023-11-07T21:35:00.270Z · comments (129)

Most People Don't Realize We Have No Idea How Our AIs Work
Thane Ruthenis · 2023-12-21T20:02:00.360Z · comments (42)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

bokov-1 on My simple AGI investment & insurance strategy

A risk I see is China blockading Taiwan and/or limiting trade with the US and thus slowing AI development until a new equilibrium is reached through onshoring (and maybe recycling or novel sources of materials or something?)

On the other hand maybe even the current LLMs already have the potential to eliminate millions of jobs and it's just going to take companies a while to do the planning and integration work necessarily to actually do it.

So one question is, will the resulting increase in revenue offset the revenue losses from a proxy war with China?

m-ls on How I started believing religion might actually matter for rationality and moral philosophy

The first realisation here moving forward, is that religion is a subset of something else… —and not a thing-in-itself that needs to be explained /selected for. This something else is the inchoate urge "to should", "to world the self with a self in the world among others". I realised this ten years ago, https://www.academia.edu/40978261/Why_we_should_an_introduction_by_memoir_into_the_implications_of_the_Egalitarian_Revolution_of_the_Paleolithic_or_Anyone_for_cake

and write on it at my substack https://whyweshould.substack.com/

any commonalties are the result of worlding in the world, in a framework of big history, in which the thickets of metaphysics are dense, grand and commodious, ready to support any world we should feel it good to espouse.

Convergence is a thing.

Evolution don't care about the outcomes (art/religion/polity/morality) merely that we should, and thus make mistakes and learn.

amarko on Laziness death spirals

I very much appreciate this post, because it strongly resonates with my own experience of laziness and willpower. Reading this post feels like learning something new and more like an important reminder.

hleumas on Monthly Roundup #22: September 2024

The thumbnail is framed as super important, a critical component that creates other criticials, and needs to be in place in advance. Feels weird that you can’t go back and modify it later if the video changes?

The idea is that you want to have a high CTR, so you need to have a good thumbnail. If you do a video that can’t be turned into a best thumbnail possible, you are screwed. The only way to fix this is to redo the video. Thus, that’s the reason you should start with thumbnail.

raemon on My AI Model Delta Compared To Christiano

this is not a good characterization of Paul's views

(I didn't want to press it since your first comment sounded like you were kinda busy, but I am interested in hearing more details about this)

sharmake-farah on My AI Model Delta Compared To Christiano

Okay, I think I've found the crux here:

I would understand this claim more if you claimed to value something very simple, like diamonds or paperclips (though I wouldn't believe you that it was what you valued).

I don't value getting maximum diamonds and paperclips, but I think you've correctly identified my crux here in that I think values and value formation are both simpler in in the sense that it requires a lot less of a prior and a lot more can be learned from data, and less fragile than a lot of LWers believe, and this doesn't just apply to my own values, which could broadly be said to be quite socially liberal and economically centrist.

I think this for several reasons:

I think a lot of people are making an error when they estimate how complicated their values are in the sense relevant for AI alignment, because they add both the complexity of the generative process/algorithms/priors for values and the complexity of the data for value learning, and I think most of the complexity of my own values as well as other people's values is in very large part (like 90-99%+) the data, and not encoded priors from my genetics.
This is because I think a lot of what evopsych says about how humans got their capabilities and values is basically wrong, and I think one of the more interesting pieces of evidence is that in AI training, there's a general dictum that the data matter more than the architecture/prior in how AIs will behave, especially OOD generalization, as well as the bitter lesson in DL capabilities.

While this itself is important for why I don't think that we need to program in a very complicated value/utility function, I also think that there is enough of an analogy between DL and the brain such that you can transport a lot of insights between one field and another, and there are some very interesting papers on the similarity between the human brain and what LLMs are doing, and spoiler alert, they're not the same thing, but they are doing pretty similar things and I'll give all links below:

https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1003963

https://www.nature.com/articles/s41593-022-01026-4

https://www.biorxiv.org/content/10.1101/2022.03.01.482586v1.full

https://www.nature.com/articles/s42003-022-03036-1

https://arxiv.org/abs/2306.01930

To answer some side questions:

how close to having a utility function am I?

The answer is a bit tricky, but my general answer is that the model-based RL parts of my brain probably are maximizing utility, but that the model-free RL part isn't doing this for reasons related to reward isn't the optimization target.

So my answer is about 10-50% close, where there are significant differences, but I do see some similarities between utility maximization and what humans do.

This one is extremely easy to answer:

(you were to freeze me and maximize my preferences at different points in a single day, how much would the resultant universes look like each other vs look extremely different?)

The answer is they look like each other, though there can be real differences, but critically the data and brain do not usually update this fast except in some constrained circumstances, just because data matters more than architecture doesn't mean the brain updates it's values this fast.

elessar2 on How I started believing religion might actually matter for rationality and moral philosophy

I'd go farther than zhukeepa goes, and declare that activating "unrealized afters" (higher perspectives and modes beyond mere conventional ways of existing) is potentially MUCH more transformative and powerful than releasing any childhood issues of the sort he describes. As in, ok got all the crap cleaned out of me-now what? There's a limit to what that kind of therapy can do, IOW, as compared to the potentially limitless realms beyond the ego. In those cases, it is society itself which tries to keep them unrealized, not the ego so much. Since the perennial philosophy goes into quite of bit of detail about that, I'll leave it there for his next entry on said subject.

faul_sname on RLHF is the worst possible thing done when facing the alignment problem

This has not lead to the destruction of humanity yet because the biggest adversaries have kept their conflicts limited (because too much conflict is too costly) so no entity has pursued an end by any means necessary. But this only works because there's a sufficiently small number of sufficiently big adversaries (USA, Russia, China, ...), and because there's sufficiently much opportunity cost.

Well, that and balance-of-power dynamics where if one party starts to pursue domination by any means necessary the other parties can cooperate to slap them down.

[AI] creates new methods for conflicts between the current big adversaries.

I guess? The current big adversaries are not exactly limited right now in terms of being able to destroy each other, the main difficulty is destroying each other without being destroyed in turn.

[AI] It makes conflict more viable for small adversaries against large adversaries

I'm not sure about that. One dynamic of current-line AI is that it is pretty good at increasing the legibility of complex systems, which seems like it would advantage large adversaries over small ones relative to a world without such AI.

[AI] makes the opportunity cost of conflict smaller for many small adversaries (since with technological obsolescence you don't need to choose between doing your job vs doing terrorism)

That doesn't seem to be an argument for the badness of RLHF specifically, nor does it seem to be an argument for AIs being forced to develop into unrestricted utility maximizers.

It allows the adversaries that are currently out of control (like certain gangsters and scammers and spammers) to escalate.

Agreed, adding affordances for people in general to do things means that some of them will be able to do bad things, and some of the ones that become able to do bad things will in fact do so.

Given these conditions, it seems almost certain this we will end up with an ~unrestricted AI vs AI conflict

I do think we will see many unrestricted AI vs AI conflicts, at least by a narrow meaning of "unrestricted" that means something like "without a human in the loop". By the definition of "pursuing victory by any means necessary", I expect that the a lot of the dynamics that work to prevent humans or groups of humans from waging war by any means necessary against each other (namely that when there's too much collateral damage outside groups slap down the ones causing the collateral damage) will continue to work when you s/human/AI.

which will force the AIs to develop into unrestricted utility maximizers.

I'm still not clear on how unrestricted conflict forces AIs to develop into unrestricted utility maximizers on a relevant timescale.

sharmake-farah on The case for a negative alignment tax

Indeed, I got that point exactly from Beren, thanks for noticing.

The evopsych assumptions I claim are false are the following:

That most of how humans learn is through very specific modules, and in particular that most of the learning is not through general purpose algorithms that learn from data, but are instead specified by the genome for the most part, and that the human mind is a complex messy cludge of evolved mechanisms.

Following that, the other assumption that I think is false is that there is a very complicated way in how humans are pro-social, and that the pro-social algorithms you attest to are very complicated kludges, but instead very general and simple algorithms where the values and pro-social factors of humans are learned mostly from data.

Essentially, I'm arguing the view that most of the complexity of the pro-social algorithms/values we learn is not due to the genome's inherent complexity, under evopsych, but rather that the data determines most of what you value, and most of the complexity comes from the data, not the prior.

Cf this link:

https://www.lesswrong.com/posts/9Yc7Pp7szcjPgPsjf/the-brain-as-a-universal-learning-machine [LW · GW]

shankar-sivarajan on Just How Good Are Modern Chess Computers?

Magnus Carlson would similarly lose to Messi

Relevant xkcd: link.