LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Thinking By The Clock
Screwtape · 2023-11-08T07:40:59.936Z · comments (27)

Why Would Belief-States Have A Fractal Structure, And Why Would That Matter For Interpretability? An Explainer
johnswentworth · 2024-04-18T00:27:43.451Z · comments (21)

The other side of the tidal wave
KatjaGrace · 2023-11-03T05:40:05.363Z · comments (85)

Humming is not a free $100 bill
Elizabeth (pktechgirl) · 2024-06-06T20:10:02.457Z · comments (6)

[link] Daniel Kahneman has died
DanielFilan · 2024-03-27T15:59:14.517Z · comments (11)

Introducing Alignment Stress-Testing at Anthropic
evhub · 2024-01-12T23:51:25.875Z · comments (23)

Safety consultations for AI lab employees
Zach Stein-Perlman · 2024-07-27T15:00:27.276Z · comments (4)

"Humanity vs. AGI" Will Never Look Like "Humanity vs. AGI" to Humanity
Thane Ruthenis · 2023-12-16T20:08:39.375Z · comments (34)

re: Yudkowsky on biological materials
bhauth · 2023-12-11T13:28:10.639Z · comments (30)

[link] Why I’m not a Bayesian
Richard_Ngo (ricraz) · 2024-10-06T15:22:45.644Z · comments (87)

Contra papers claiming superhuman AI forecasting
nikos (followtheargument) · 2024-09-12T18:10:50.582Z · comments (16)

[link] Toward a Broader Conception of Adverse Selection
Ricki Heicklen (bayesshammai) · 2024-03-14T22:40:57.920Z · comments (61)

Every "Every Bay Area House Party" Bay Area House Party
Richard_Ngo (ricraz) · 2024-02-16T18:53:28.567Z · comments (6)

Skills from a year of Purposeful Rationality Practice
Raemon · 2024-09-18T02:05:58.726Z · comments (18)

The Hopium Wars: the AGI Entente Delusion
Max Tegmark (MaxTegmark) · 2024-10-13T17:00:29.033Z · comments (52)

[link] FHI (Future of Humanity Institute) has shut down (2005–2024)
gwern · 2024-04-17T13:54:16.791Z · comments (22)

[question] Why is o1 so deceptive?
abramdemski · 2024-09-27T17:27:35.439Z · answers+comments (23)

Struggling like a Shadowmoth
Raemon · 2024-09-24T00:47:05.030Z · comments (38)

WTH is Cerebrolysin, actually?
gsfitzgerald (neuroplume) · 2024-08-06T20:40:53.378Z · comments (23)

Effective Aspersions: How the Nonlinear Investigation Went Wrong
TracingWoodgrains (tracingwoodgrains) · 2023-12-19T12:00:23.529Z · comments (170)

Timaeus's First Four Months
Jesse Hoogland (jhoogland) · 2024-02-28T17:01:53.437Z · comments (6)

Critical review of Christiano's disagreements with Yudkowsky
Vanessa Kosoy (vanessa-kosoy) · 2023-12-27T16:02:50.499Z · comments (40)

'Empiricism!' as Anti-Epistemology
Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2024-03-14T02:02:59.723Z · comments (90)

Did Christopher Hitchens change his mind about waterboarding?
Isaac King (KingSupernova) · 2024-09-15T08:28:09.451Z · comments (22)

This is already your second chance
Malmesbury (Elmer of Malmesbury) · 2024-07-28T17:13:57.680Z · comments (13)

Three Subtle Examples of Data Leakage
abstractapplic · 2024-10-01T20:45:27.731Z · comments (16)

2023 Unofficial LessWrong Census/Survey
Screwtape · 2023-12-02T04:41:51.418Z · comments (81)

[link] Recommendation: reports on the search for missing hiker Bill Ewasko
eukaryote · 2024-07-31T22:15:03.174Z · comments (28)

Reconsider the anti-cavity bacteria if you are Asian
Lao Mein (derpherpize) · 2024-04-15T07:02:02.655Z · comments (43)

The 'Neglected Approaches' Approach: AE Studio's Alignment Agenda
Cameron Berg (cameron-berg) · 2023-12-18T20:35:01.569Z · comments (21)

Many arguments for AI x-risk are wrong
TurnTrout · 2024-03-05T02:31:00.990Z · comments (86)

Cryonics is free
Mati_Roy (MathieuRoy) · 2024-09-29T17:58:17.108Z · comments (33)

How useful is mechanistic interpretability?
ryan_greenblatt · 2023-12-01T02:54:53.488Z · comments (54)

[link] Boycott OpenAI
PeterMcCluskey · 2024-06-18T19:52:42.854Z · comments (26)

Announcing ILIAD — Theoretical AI Alignment Conference
Nora_Ammann · 2024-06-05T09:37:39.546Z · comments (18)

The likely first longevity drug is based on sketchy science. This is bad for science and bad for longevity.
BobBurgers · 2023-12-12T02:42:18.559Z · comments (34)

Is being sexy for your homies?
Valentine · 2023-12-13T20:37:02.043Z · comments (92)

You can remove GPT2’s LayerNorm by fine-tuning for an hour
StefanHex (Stefan42) · 2024-08-08T18:33:38.803Z · comments (11)

Without fundamental advances, misalignment and catastrophe are the default outcomes of training powerful AI
Jeremy Gillen (jeremy-gillen) · 2024-01-26T07:22:06.370Z · comments (60)

[link] Sycophancy to subterfuge: Investigating reward tampering in large language models
Carson Denison (carson-denison) · 2024-06-17T18:41:31.090Z · comments (22)

[link] Connecting the Dots: LLMs can Infer & Verbalize Latent Structure from Training Data
Johannes Treutlein (Johannes_Treutlein) · 2024-06-21T15:54:41.430Z · comments (13)

And All the Shoggoths Merely Players
Zack_M_Davis · 2024-02-10T19:56:59.513Z · comments (57)

DeepMind's "Frontier Safety Framework" is weak and unambitious
Zach Stein-Perlman · 2024-05-18T03:00:13.541Z · comments (14)

Vote on Interesting Disagreements
Ben Pace (Benito) · 2023-11-07T21:35:00.270Z · comments (129)

Most People Don't Realize We Have No Idea How Our AIs Work
Thane Ruthenis · 2023-12-21T20:02:00.360Z · comments (42)

[link] Making every researcher seek grants is a broken model
jasoncrawford · 2024-01-26T16:06:26.688Z · comments (41)

What’s up with LLMs representing XORs of arbitrary features?
Sam Marks (samuel-marks) · 2024-01-03T19:44:33.162Z · comments (61)

[link] Masterpiece
Richard_Ngo (ricraz) · 2024-02-13T23:10:35.376Z · comments (21)

EIS XIII: Reflections on Anthropic’s SAE Research Circa May 2024
scasper · 2024-05-21T20:15:36.502Z · comments (16)

[link] Succession
Richard_Ngo (ricraz) · 2023-12-20T19:25:03.185Z · comments (48)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

meme-marine on The Compendium, A full argument about extinction risk from AGI

I am unsurprised but disappointed to read the same Catastrophe arguments rehashed here, based on an outdated Bostromian paradigm of AGI. This is the main section I disagree with.

The underlying principle beneath these hypothetical scenarios is grounded in what we can observe around us: powerful entities control weaker ones, and weaker ones can fight back only to the degree that the more powerful entity isn’t all that powerful after all.

I do not think this is obvious or true at all. Nation-States are often controlled by a small group of people or even a single person, no different physiologically to any other human being. If it really wanted to, there would be nothing at all stopping the US military from launching a coup on its civilian government; in fact, military coups are a commonplace global event. Yet, generally, most countries do not suffer constant coup attempts. We hold far fewer tools to "align" military leaders than we do AI models - we cannot control how generals were raised as children, cannot read their minds, cannot edit their minds.

I think you could also make a similar argument that big things control little things - with much more momentum and potential energy, we observe that large objects are dominant over small objects. Small objects can only push large objects to the extent that the large object is made of a material that is not very dense. Surely, then building vehicles substantially larger than people would result in uncontrollable runaways that would threaten human life and property! But in reality, runaway dump truck incidents are fairly uncommon. A tiny man can control a giant machine. Not all men can - only the one in the cockpit.

My point is that it is not at all obvious that a powerful AI would lack such a cockpit. If its goals are oriented around protecting or giving control to a set of individuals, I see no reason whatsoever why it would do a 180 and kill its commander, especially since the AI systems that we can build in practice are more than capable of understanding the nuances of their commands.

The odds of an average chess player with an ELO of 1200 against a grandmaster with ELO 2500 are 1 to a million. Against the best chess AI today with an ELO of 3600, the odds are essential 0.

Chess is a system that's perfectly predictable. Reality is a chaotic system. Chaotic systems - like a three-body orbital arrangement - are impossible to perfectly predict in all cases even if they're totally deterministic, because even minute inaccuracies in measurement can completely change the result. One example would be the edges of the Mandelbrot set. It's fractal. Therefore, even an extremely powerful AI would be beholden to certain probabilistic barriers, notwithstanding quantum-random factors.

Many assume that an AI is only dangerous if it has hostile intentions, but the danger of godlike AI is not a matter of its intent, but its power and autonomy. As these systems become increasingly agentic and powerful, they will pursue goals that will diverge from our own.

It would not be incorrect to describe someone who pursued their goals irrespective of its externalities to be malevolent. Bank robbers don't want to hurt people, they want money. Yet I don't think anyone would suggest that the North Hollywood shooters were "non-hostile but misaligned". I do not like this common snippet of rhetoric and I think it is dishonest. It attempts to distance these fears of misaligned AI from movie characters such as Skynet, but ultimately, this is the picture that is painted.

Goal divergence is a hallmark of the Bostromian paradigm - the idea that a misspecified utility function, optimized hypercompetently, would lead to disaster. Modern AI systems do not behave like this. They behave in a much more humanlike way. They do not have objective functions that they pursue doggedly. The Orthogonality Thesis states that intelligence is uncorrelated with objectives. The unstated connection here, I think, is that their initial goals must have been misaligned in the first place, but stated like this, it sounds a little like you expect a superintelligent AI to suddenly diverge from its instructions for no reason at all.

Overall, this is a very vague section. I think you would benefit from explaining some of the assumptions being made here.

I'm not going to go into detail on the Alignment section, but I think that many of its issues are similar to the ones listed above. I think that the arguments are not compelling enough for lay people, mostly because I don't think they're correct. I think that the definition of Alignment you have given - "the ability to “steer AI systems toward a person's or group's intended goals, preferences, and ethical principles.”" - does not match the treatment it is given. I think that it is obvious that the scope of Alignment is too vague, broad, and unverifiable for it to be a useful concept. I think that Richard Ngo's post:

https://www.lesswrong.com/posts/67fNBeHrjdrZZNDDK/defining-alignment-research [LW · GW]

is a good summary of the issues I see with the current idea of Alignment as it is often used in Rationalist circles and how it could be adapted to suit the world in which we find ourselves.

Finally, I think that the Governance section could very well be read uncharitably as a manifesto for world domination. Less than a dozen people attend PauseAI protests; you do not have the political ability to make this happen. The ideas contained in this document, which resemble many other documents, such as a similar one created by the PauseAI group, are not compelling enough to sway people who are not already believers in its ideas, and the Rationalist language used in them is anathemic to the largest ideological groups that would otherwise support your cause.

You may receive praise from Rationalist circles, but I do not think you will reach a large audience with this type of work. Leopold Aschenbrenner's essay managed to reach a fairly substantial audience, and it has similar themes to your document, so in principle, people are willing to read this sort of writing. The main flaw is that it doesn't add anything to the conversation, and because of that, it won't change anyone's minds. The reason that the public discourse doesn't involve Alignment talk isn't due to lack of awareness, it's because it isn't at all compelling to most people. Writing it better, with a nicer format, will not change this.

yair-halberstadt on Trading Candy

Counterpoint: when I was about 12, I was too old to collect candy at my Synagogue on Simchat Torah, so I would beg a single candy from someone, then trade it up (Dutch book style) with naive younger kids until I had a decent stash. I was particularly pleased whenever my traded up stash included the original sweet.

denkenberger on Is the Power Grid Sustainable?

It would be helpful to see a calculation with your rates, the installed cost of batteries, cost of the space taken up, losses in the batteries, any cost of maintenance, lifetime of batteries, and cost (or benefit) of disposal.

nathan-helm-burger on johnswentworth's Shortform

I like it. I do work that it, and The Narrow Path, are both missing how hard it will be to govern and restrict AI.

aprilsr on Alexander Gietelink Oldenziel's Shortform

I think it's pretty good to keep it in mind that heliocentrism is literally speaking just a change in what coordinate system you use, but it is legitimately a much more convenient coordinate system.

raemon on JargonBot Beta Test

Mmm, that does seem reasonable.

johnswentworth on Trading Candy

To be clear, I don't really think of myself as libertarian these days, though I guess it'd probably look that way if you just gave me a political alignment quiz.

To answer your question: I'm two years older than my brother, who is two years older than my sister.

zy on Trading Candy

I think that is probably not a good reason to be libertarian in my opinion? Could you also share maybe how much older were your than you siblings? If you are not that far apart, you and your siblings came from the same starting line, distributing is not going to happen in real life economically nor socially even if not libertarian (in real life, where we need equity is when the starting line is not the same and is not able to be changed by choice. A more similar analogy might be some kids are born with large ears, and large ears are favored by the society, and the large eared kids always get more candy). If you are ages apart with you being a lot older, it may make some limited sense to for your parents to re-distribute.

quila on JargonBot Beta Test

Let us know what you think!

the grey text feels disruptive to normal reading flow but idk why green link text wouldn't also be, maybe i'm just not used to it. e.g., in this post's "Curating technical posts" where 'Curating' is grey, my mind sees "<Curating | distinct term> technical posts" instead of [normal meaning inference not overfocused on individual words]

Is this useful, as a reader?

if the authors make sure they agree with all the definitions they allow into the glossary, yes. author-written definitions would be even more useful because how things are worded can implicitly convey things like, the underlying intuition, ontology, or related views they may be using wording to rule in or out.

Whenever an author with 100+ karma saves a draft of a post, our database queries a language model to:

i would prefer this be optional too, for drafts which are meant to be private (e.g. shared with a few other users, e.g. may contain possible capability-infohazards), where the author doesn't trust LM companies

johnswentworth on Ryan Kidd's Shortform

Y'know @Ryan [LW · GW], MATS should try to hire the PIBBSS folks to help with recruiting. IMO they tend to have the strongest participants of the programs on this chart which I'm familiar with (though high variance).