LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

[link] Why I’m not a Bayesian
Richard_Ngo (ricraz) · 2024-10-06T15:22:45.644Z · comments (92)

[link] Daniel Kahneman has died
DanielFilan · 2024-03-27T15:59:14.517Z · comments (11)

Humming is not a free $100 bill
Elizabeth (pktechgirl) · 2024-06-06T20:10:02.457Z · comments (6)

Introducing Alignment Stress-Testing at Anthropic
evhub · 2024-01-12T23:51:25.875Z · comments (23)

Safety consultations for AI lab employees
Zach Stein-Perlman · 2024-07-27T15:00:27.276Z · comments (4)

"Humanity vs. AGI" Will Never Look Like "Humanity vs. AGI" to Humanity
Thane Ruthenis · 2023-12-16T20:08:39.375Z · comments (34)

Skills from a year of Purposeful Rationality Practice
Raemon · 2024-09-18T02:05:58.726Z · comments (18)

Contra papers claiming superhuman AI forecasting
nikos (followtheargument) · 2024-09-12T18:10:50.582Z · comments (16)

re: Yudkowsky on biological materials
bhauth · 2023-12-11T13:28:10.639Z · comments (30)

[link] Toward a Broader Conception of Adverse Selection
Ricki Heicklen (bayesshammai) · 2024-03-14T22:40:57.920Z · comments (61)

Every "Every Bay Area House Party" Bay Area House Party
Richard_Ngo (ricraz) · 2024-02-16T18:53:28.567Z · comments (6)

[question] Why is o1 so deceptive?
abramdemski · 2024-09-27T17:27:35.439Z · answers+comments (24)

[link] FHI (Future of Humanity Institute) has shut down (2005–2024)
gwern · 2024-04-17T13:54:16.791Z · comments (22)

This is already your second chance
Malmesbury (Elmer of Malmesbury) · 2024-07-28T17:13:57.680Z · comments (13)

WTH is Cerebrolysin, actually?
gsfitzgerald (neuroplume) · 2024-08-06T20:40:53.378Z · comments (23)

Struggling like a Shadowmoth
Raemon · 2024-09-24T00:47:05.030Z · comments (38)

Effective Aspersions: How the Nonlinear Investigation Went Wrong
TracingWoodgrains (tracingwoodgrains) · 2023-12-19T12:00:23.529Z · comments (170)

Critical review of Christiano's disagreements with Yudkowsky
Vanessa Kosoy (vanessa-kosoy) · 2023-12-27T16:02:50.499Z · comments (40)

Timaeus's First Four Months
Jesse Hoogland (jhoogland) · 2024-02-28T17:01:53.437Z · comments (6)

'Empiricism!' as Anti-Epistemology
Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2024-03-14T02:02:59.723Z · comments (90)

Three Subtle Examples of Data Leakage
abstractapplic · 2024-10-01T20:45:27.731Z · comments (16)

Did Christopher Hitchens change his mind about waterboarding?
Isaac King (KingSupernova) · 2024-09-15T08:28:09.451Z · comments (22)

[link] Recommendation: reports on the search for missing hiker Bill Ewasko
eukaryote · 2024-07-31T22:15:03.174Z · comments (28)

2023 Unofficial LessWrong Census/Survey
Screwtape · 2023-12-02T04:41:51.418Z · comments (81)

Reconsider the anti-cavity bacteria if you are Asian
Lao Mein (derpherpize) · 2024-04-15T07:02:02.655Z · comments (43)

The 'Neglected Approaches' Approach: AE Studio's Alignment Agenda
Cameron Berg (cameron-berg) · 2023-12-18T20:35:01.569Z · comments (21)

Many arguments for AI x-risk are wrong
TurnTrout · 2024-03-05T02:31:00.990Z · comments (86)

How useful is mechanistic interpretability?
ryan_greenblatt · 2023-12-01T02:54:53.488Z · comments (54)

The Median Researcher Problem
johnswentworth · 2024-11-02T20:16:11.341Z · comments (69)

[link] Boycott OpenAI
PeterMcCluskey · 2024-06-18T19:52:42.854Z · comments (26)

Is being sexy for your homies?
Valentine · 2023-12-13T20:37:02.043Z · comments (92)

Announcing ILIAD — Theoretical AI Alignment Conference
Nora_Ammann · 2024-06-05T09:37:39.546Z · comments (18)

[link] Sycophancy to subterfuge: Investigating reward tampering in large language models
Carson Denison (carson-denison) · 2024-06-17T18:41:31.090Z · comments (22)

o1 is a bad idea
abramdemski · 2024-11-11T21:20:24.892Z · comments (36)

You can remove GPT2’s LayerNorm by fine-tuning for an hour
StefanHex (Stefan42) · 2024-08-08T18:33:38.803Z · comments (11)

The likely first longevity drug is based on sketchy science. This is bad for science and bad for longevity.
BobBurgers · 2023-12-12T02:42:18.559Z · comments (34)

And All the Shoggoths Merely Players
Zack_M_Davis · 2024-02-10T19:56:59.513Z · comments (57)

Without fundamental advances, misalignment and catastrophe are the default outcomes of training powerful AI
Jeremy Gillen (jeremy-gillen) · 2024-01-26T07:22:06.370Z · comments (60)

[link] Connecting the Dots: LLMs can Infer & Verbalize Latent Structure from Training Data
Johannes Treutlein (Johannes_Treutlein) · 2024-06-21T15:54:41.430Z · comments (13)

The Summoned Heroine's Prediction Markets Keep Providing Financial Services To The Demon King!
abstractapplic · 2024-10-26T12:34:51.059Z · comments (16)

[link] Masterpiece
Richard_Ngo (ricraz) · 2024-02-13T23:10:35.376Z · comments (21)

DeepMind's "Frontier Safety Framework" is weak and unambitious
Zach Stein-Perlman · 2024-05-18T03:00:13.541Z · comments (14)

[link] Making every researcher seek grants is a broken model
jasoncrawford · 2024-01-26T16:06:26.688Z · comments (41)

[link] Succession
Richard_Ngo (ricraz) · 2023-12-20T19:25:03.185Z · comments (48)

Most People Don't Realize We Have No Idea How Our AIs Work
Thane Ruthenis · 2023-12-21T20:02:00.360Z · comments (42)

Language Models Model Us
eggsyntax · 2024-05-17T21:00:34.821Z · comments (55)

What’s up with LLMs representing XORs of arbitrary features?
Sam Marks (samuel-marks) · 2024-01-03T19:44:33.162Z · comments (61)

EIS XIII: Reflections on Anthropic’s SAE Research Circa May 2024
scasper · 2024-05-21T20:15:36.502Z · comments (16)

Formal verification, heuristic explanations and surprise accounting
Jacob_Hilton · 2024-06-25T15:40:03.535Z · comments (11)

Deep Honesty
Aletheophile (aletheo) · 2024-05-07T20:31:48.734Z · comments (25)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

donatas-luciunas on Alignment is not intelligent

I am sure you can't prove your position. And I am sure I can prove my position.

Your reasoning is based on assumption that all value is known. If utility function assigns value to something - it is valuable. If utility function does not assign value - it is not valuable. While the truth is that something might be valuable but your utility function does not know it yet. It would be more intelligent to use 3 categories - valuable, not valuable and unknown.

Let's say you are booking a flight and you have a possibility to get checked baggage for free. It's absolutely not relevant for you to your best current knowledge. But you understand that your knowledge might change and it costs nothing to keep more options open, so you take the checked baggage.

Let's say you are traveler, wanderer. You have limited space in your backpack. Sometimes you find items and you need to choose - put it in the backpack or not. You definitely keep items that are useful. You leave behind items that are not useful. What you do if you find an item which usefulness is unknown? Some mysterious item. Take it if it is small, leave it if it is big? According to you it is obvious to leave it. Does not sound intelligent for me.

We can draw a little decision matrix:

Leave item
- no burden 👍
- no opportunity to use it
Take item
- a burden 👎
- may be useful, may be harmful, may have no effect
- knowledge about usefuness of an item 👍

Don't you think that "knowledge about usefuness of an item" can sometimes be worth "a burden"? Basically I described a concept of experiment here.

We are deep in a rabbit hole, but I hope you understand the importance. If intelligence and goal are coupled (according to me they are) all current alignment research is dangerously misleading.

davidmanheim on (Salt) Water Gargling as an Antiviral

Do I understand correctly that the blue-green graph has a y-axis that goes above 100% median reduction, with error bars in that range? (This would happen if they estimated a proportion as a standard variable - not great practice, but I want to check that it is what happened.)

q-home on Making a conservative case for alignment

Napoleon is merely an argument for "just because you strongly believe it, even if it is a statement about you, does not necessarily make it true".

When people make arguments, they often don't list all of the premises. That's not unique to trans discourse. Informal reasoning is hard to make fully explicit. "Your argument doesn't explicitly exclude every counterexample" is a pretty cheap counter-argument. What people experience is important evidence and an important factor, it's rational to bring up instead of stopping yourself with "wait, I'm not allowed to bring that up unless I make an analytically bulletproof argument". For example, if you trust someone that they feel strongly about being a woman, there's no reason to suspect them of being a cosplayer who chases Twitter popularity.

I expect that you will disagree with a lot of this, and that's okay; I am not trying to convince you, just explaining my position.

I think I still don't understand the main conflict which bothers you. I thought it was "I'm not sure if trans people are deluded in some way (like Napoleons, but milder) or not". But now it seems like "I think some people really suffer and others just cosplay, the cosplayers take something away from true sufferers". What is taken away?

jblack on What epsilon do you subtract from "certainty" in your own probability estimates?

For all practical purposes, such credences don't matter. Such scenarios certainly can and do happen, but in almost all cases there's nothing you can do about them without exceeding your own bounded rationality and agency.

If the stakes are very high then it may make sense to consider the probability of some sort of trick, and attempt to get further evidence of the physical existence of the coin and that its current state matches what you are seeing.

There is essentially no point in assigning probabilities to hypotheses of failures of your mind itself. You can't reason your way out of serious mind malfunction using arithmetic. At best you could hope to recognize that it is malfunctioning, and try not to do anything that will make things worse. In the case of mental impairment severe enough to have false memories or sensations this blatant, a rational person should expect that a person so affected wouldn't be capable of correctly carrying out quantified Bayesian reasoning.

My own background credences are generally not insignificant for something like this or even stranger, but they play essentially zero role in my life and definitely not in any probability calculations. Such hypotheses are essentially untestable and unactionable.

arturo-macias on Arthropod (non) sentience

We are surprisingly high in forebrain neuron count:

https://en.m.wikipedia.org/wiki/List_of_animals_by_number_of_neurons

peterbarnett on Daniel Kokotajlo's Shortform

I've been playing around with Suno, inspired by this Rob Long Tweet: https://x.com/rgblong/status/1857233734640222364
I've been pretty shocked at how easily it makes music that I want to listen to (mainly slightly cringe midwest emo): https://suno.com/song/1a5a1edf-9711-4ca4-a2f7-ef814ca298b4

zero-contradictions on Eugenics Performed By A Blind, Idiot God

I don't believe that gene-editing is a viable solution to preventing dysgenics for the entire population.

Unregulated reproduction has the potential to harm others, so it's reasonable to regulate it.

seth-herd on Should you increase AI alignment funding, or increase AI regulation?

The reason this is a difficult question is that we don't know how hard alignment will be. Opinions from different people with best-in-class expertise and time-on-task disagree wildly.

Therefore I'd argue that we should throw effort and funding into resolving that question by putting the reasoning processes of the relevant experts to wider scrutiny, and do a more systematic job of evaluating them.

Funding comes from a different resource pool than regulation, so you might mean which one should get your advocacy efforts. The same arguments apply to both of them, and to the meta-alignment question.

seth-herd on How can we prevent AGI value drift?

I wish the odds for getting AGI into trustworthy hands were better. The source of my optimism is the hope that those hands just need to be decent - to have what I've conceptualized as a positive empathy - sadism balance. That's anyone who's not a total sociopath (lacking empathy and tending toward vengeance and competition) and/or sadist. I hope that about 90-99% of humanity would eventually make the world vastly better with their AGI, just because it's trivially easy for them to do, so it only requires the smallest bit of goodwill.

I wish I were more certain of that. I've tried to look a little at some historical examples of rulers born into power and with little risk of losing it. A disturbing number of them were quite callous rulers. They were usually surrounded by a group of advisors that got them to ignore the plight of the masses and focus on the concerns of an elite few. But this situation isn't analogous - once your AGI hits superintelligence, it would be trivially easy to both help the masses in profound ways, and pursue whatever crazy schemes you and your friends have come up with. Thus my limited optimism.

WRT the distributed power structure of Western governments: I think AGI would be placed under executive authority, like the armed forces, and the US president and those with similar roles in other countries would hold near-total power, should they choose to use it. They could transform democracies into dictatorships with ease. And we very much do continue to elect selfish and power-hungry individuals, some of whom probably actually have a negative empathy-sadism balance.

Looking back, I note that you said I argued for "good odds" while I said "decent odds". We may be in agreement on the odds.

But there's more to consider here. Thanks again for engaging; I'd like to get more discussion of this topic going. I doubt you or I are seeing all of the factors that will be obvious in retrospect yet.

leogao on leogao's Shortform

the most valuable part of a social event is often not the part that is ostensibly the most important, but rather the gaps between the main parts.

at ML conferences, the headline keynotes and orals are usually the least useful part to go to; the random spontaneous hallway chats and dinners and afterparties are extremely valuable
when doing an activity with friends, the activity itself is often of secondary importance. talking on the way to the activity, or in the gaps between doing the activity, carry a lot of the value
at work, a lot of the best conversations happen outside of scheduled 1:1s and group meetings, but rather happen in spontaneous hallway or dinner groups