LessWrong 2.0 Reader

View: New · Old · Top

next page (older posts) →

South Bay Meetup
DavidFriedman · 2023-01-30T23:35:22.817Z · comments (0)

Peter Thiel's speech at Oxford Debating Union on technological stagnation, Nuclear weapons, COVID, Environment, Alignment, 'anti-anti anti-anti-classical liberalism', Bostrom, LW, etc.
M. Y. Zuo · 2023-01-30T23:31:26.134Z · comments (33)

Medical Image Registration: The obscure field where Deep Mesaoptimizers are already at the top of the benchmarks. (post + colab notebook)
Hastings (hastings-greer) · 2023-01-30T22:46:31.352Z · comments (1)

Humans Can Be Manually Strategic
Screwtape · 2023-01-30T22:35:39.010Z · comments (0)

Why I hate the "accident vs. misuse" AI x-risk dichotomy (quick thoughts on "structural risk")
David Scott Krueger (formerly: capybaralet) (capybaralet) · 2023-01-30T18:50:17.613Z · comments (41)

2022 Unofficial LessWrong General Census
Screwtape · 2023-01-30T18:36:30.616Z · comments (33)

[link] Call for submissions: “(In)human Values and Artificial Agency”, ALIFE 2023
the gears to ascension (lahwran) · 2023-01-30T17:37:48.882Z · comments (4)

What I mean by "alignment is in large part about making cognition aimable at all"
So8res · 2023-01-30T15:22:09.294Z · comments (25)

[link] The Energy Requirements and Feasibility of Off-World Mining
clans · 2023-01-30T15:07:59.872Z · comments (1)

[link] Whatever their arguments, Covid vaccine sceptics will probably never convince me
contrarianbrit · 2023-01-30T13:42:56.028Z · comments (10)

Simulacra Levels Summary
Zvi · 2023-01-30T13:40:00.774Z · comments (14)

A Few Principles of Successful AI Design
Vestozia (damien-lasseur) · 2023-01-30T10:42:25.960Z · comments (0)

Against Boltzmann mesaoptimizers
porby · 2023-01-30T02:55:12.041Z · comments (6)

How Likely is Losing a Google Account?
jefftk (jkaufman) · 2023-01-30T00:20:01.584Z · comments (12)

Model-driven feedback could amplify alignment failures
aog (Aidan O'Gara) · 2023-01-30T00:00:28.647Z · comments (1)

Takeaways from calibration training
Olli Järviniemi (jarviniemi) · 2023-01-29T19:09:30.815Z · comments (2)

Structure, creativity, and novelty
TsviBT · 2023-01-29T14:30:19.459Z · comments (4)

What is the ground reality of countries taking steps to recalibrate AI development towards Alignment first?
[deleted] · 2023-01-29T13:26:39.705Z · comments (6)

Compendium of problems with RLHF
Charbel-Raphaël (charbel-raphael-segerie) · 2023-01-29T11:40:53.147Z · comments (16)

My biggest takeaway from Redwood Research REMIX
Alok Singh (OldManNick) · 2023-01-29T11:00:07.499Z · comments (0)

EA novel published on Amazon
Timothy Underwood (timothy-underwood-1) · 2023-01-29T08:33:35.853Z · comments (0)

Reverse RSS Stats
jefftk (jkaufman) · 2023-01-29T03:40:01.216Z · comments (2)

Why and How to Graduate Early [U.S.]
Tego · 2023-01-29T01:28:32.029Z · comments (9)

Stop-gradients lead to fixed point predictions
Johannes Treutlein (Johannes_Treutlein) · 2023-01-28T22:47:35.008Z · comments (2)

[link] Eli Dourado AMA on the Progress Forum
jasoncrawford · 2023-01-28T22:18:11.801Z · comments (0)

LW Filter Tags (Rationality/World Modeling now promoted in Latest Posts)
Ruby · 2023-01-28T22:14:32.371Z · comments (4)

No Fire in the Equations
Carlos Ramirez (carlos-ramirez) · 2023-01-28T21:16:24.896Z · comments (4)

Optimality is the tiger, and annoying the user is its teeth
Christopher King (christopher-king) · 2023-01-28T20:20:33.605Z · comments (6)

On not getting contaminated by the wrong obesity ideas
Natália (Natália Mendonça) · 2023-01-28T20:18:21.322Z · comments (69)

Advice I found helpful in 2022
Orpheus16 (akash-wasil) · 2023-01-28T19:48:23.160Z · comments (5)

The Knockdown Argument Paradox
Bryan Frances · 2023-01-28T19:23:02.678Z · comments (6)

Less Wrong/ACX Budapest Feb 4th Meetup
Richard Horvath · 2023-01-28T14:49:41.367Z · comments (0)

Reflections on Deception & Generality in Scalable Oversight (Another OpenAI Alignment Review)
Shoshannah Tekofsky (DarkSym) · 2023-01-28T05:26:49.866Z · comments (7)

A Simple Alignment Typology
Shoshannah Tekofsky (DarkSym) · 2023-01-28T05:26:36.660Z · comments (2)

[link] Spooky action at a distance in the loss landscape
Jesse Hoogland (jhoogland) · 2023-01-28T00:22:46.506Z · comments (4)

[link] WaPo: "Big Tech was moving cautiously on AI. Then came ChatGPT."
Julian Bradshaw · 2023-01-27T22:54:50.121Z · comments (5)

[link] Literature review of TAI timelines
Jsevillamol · 2023-01-27T20:07:38.186Z · comments (7)

[link] Scaling Laws Literature Review
Pablo Villalobos (pvs) · 2023-01-27T19:57:08.341Z · comments (1)

The role of Bayesian ML in AI safety - an overview
Marius Hobbhahn (marius-hobbhahn) · 2023-01-27T19:40:05.727Z · comments (6)

Assigning Praise and Blame: Decoupling Epistemology and Decision Theory
adamShimi · 2023-01-27T18:16:43.025Z · comments (5)

[question] How could humans dominate over a super intelligent AI?
Marco Discendenti (marco-discendenti) · 2023-01-27T18:15:55.760Z · answers+comments (8)

[link] ChatGPT understands language
philosophybear · 2023-01-27T07:14:42.790Z · comments (4)

Jar of Chocolate
jefftk (jkaufman) · 2023-01-27T03:40:03.163Z · comments (0)

Basics of Rationalist Discourse
Duncan Sabien (Deactivated) (Duncan_Sabien) · 2023-01-27T02:40:52.739Z · comments (193)

The recent banality of rationality (and effective altruism)
CraigMichael · 2023-01-27T01:19:00.643Z · comments (7)

11 heuristics for choosing (alignment) research projects
Orpheus16 (akash-wasil) · 2023-01-27T00:36:08.742Z · comments (5)

A different observation of Vavilov Day
Elizabeth (pktechgirl) · 2023-01-26T21:50:01.571Z · comments (1)

All AGI Safety questions welcome (especially basic ones) [~monthly thread]
mwatkins · 2023-01-26T21:01:57.920Z · comments (81)

Just another thought experiment
Bohdan Kudlai (bogdan-kudlai) · 2023-01-26T19:29:46.367Z · comments (0)

Exquisite Oracle: A Dadaist-Inspired Literary Game for Many Friends (or 1 AI)
Yitz (yitz) · 2023-01-26T18:26:14.559Z · comments (1)

next page (older posts) →

Archive

2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
- January
- February
- March
- April
- May
- June
- July
- August
- September
- October
- November
- December
2024
2025

Recent comments

cole-wyeth on AI 2027: What Superintelligence Looks Like

I expect this to start not happening right away.

So at least we’ll see who’s right soon.

gwern on The Hidden Cost of Our Lies to AI

The intuition behind this approach draws from our understanding of selection in biological systems. Consider how medieval Europe dealt with violence:

This is a bad example because first, your description is incorrect (Clark nowhere suggests this in Farewell to Alms, as I just double-checked, because his thesis is about selecting for high-SES traits, not selecting against violence, and in England, not Europe - so I infer you are actually thinking of the Frost & Harpending thesis, which is about Western Europe, and primarily post-medieval England at that); second, the Frost & Harpending truncation selection hypothesis has little evidence for it and can hardly be blandly referred to, as if butter wouldn't melt in your mouth, as obviously 'how medieval Europe dealt with violence' (I don't particularly think it's true myself, just a cute idea about truncation selection); and third, it is both a weird opaque obscure example that doesn't illustrate the principle very well and is maximally inflammatory.

cousin_it on AI #110: Of Course You Know…

Yeah. I remember where I was and how I felt when covid hit in 2020, and when Russia attacked Ukraine in 2022. This tariff announcement was another event in the same row.

And it all seems so stupidly self-inflicted. Russia's economy and trade was booming until Feb 2022. The US economy and trade was doing fine until Feb 2025. Putin-2022 and Trump-2025 would've done better for their countries by simply doing nothing. Maybe this shows the true value of democratic checks and balances: most of the time they add overhead, but sometimes they'll prevent some exceptionally big and stupid decision, and that pays for all the overhead and then some.

mako-yass on MakoYass's Shortform

I briefly glanced at wikipedia and there seemed to be two articles supporting it. This one might be the one I'm referring to (if not, it's a bonus) and this one seems to suggest that conscious perception has been trained.

davey-morse on Davey Morse's Shortform

has anyone seen a good way to comprehensively map the possibility space for AI safety research?

in particular: a map from predictive conditions (eg OpenAI develops superintelligence first, no armistice is reached with China, etc) to strategies for ensuring human welfare in those conditions.

most good safety papers I read map one set of conditions to a one/a few strategies. the map would put juxtapose all these conditions so that we can evaluate/bet on their likelihoods and come up with strategies based on a full view of SOTA safety research.

for format, im imagining either a visual concept map or at least some kind of hierarchal collaborative outlining tool (eg Roam Research)

koratkar on koratkar's Shortform

More Dakka On Your Expectations

After hearing my friend talk about his roommate’s brash decision-making from the despair at getting rejected by girls he liked several times, my friend mentioned that his roommate had asked out a total of three people since high school. Only three!

While there are more factors in the story involved, I’ve heard similar enough troubles that it seems worth saying: Three people is not a lot. Certainly not enough rejections to merit the magnitude of self-worth issues people can walk away with that few from.

If you had the expectation that if the first person you ask out didn’t like you then you’re doomed to loneliness, then a (probable enough) failure would be such a damaging experience you might not try again for a long time. If you instead believed that number was two, the first rejection would hurt considerably less. The higher you go, the less it hurts.

Maybe the expectation we implicitly have from culture is low enough to make three rejections somewhat sting. Why should it? Why shouldn’t that threshold be something like sixty? Or a hundred?

If you can only alter your chances by acquiring skills, improving yourself, and looking harder, then each rejection is valuable new data on what to do better next time. None should be felt as a failure towards your goal by any means. Rejections are an indicator of progress.

tailcalled on Is instrumental convergence a thing for virtue-driven agents?

I'm showing that the assumptions necessary for your argument don't hold, so you need to better understand your own argument.

leogao on Why Have Sentence Lengths Decreased?

goodhart

jozdien on Show, not tell: GPT-4o is more opinionated in images than in text

Should we think about it almost as though it were a base model within the RLHFed model, where there's no optimization pressure toward censored output or a persona?
Or maybe a good model here is non-optimized chain-of-thought (as described in the R1 paper, for example): CoT in reasoning models does seem to adopt many of the same patterns and persona as the model's final output, at least to some extent.
Or does there end up being significant implicit optimization pressure on image output just because the large majority of the circuitry is the same?

I think it's a mix of these. Specifically, my model is something like: RLHF doesn't affect a large majority of model circuitry, and image is a modality sufficiently far from others that the effect isn't very large - the outputs do seem pretty base model like in a way that doesn't seem intrinsic to image training data. However, it's clearly still very entangled with the chat persona, so there's a fair amount of implicit optimization pressure and images often have characteristics pretty GPT-4o-like (though whether the causality goes the other way is hard to tell).

It's definitely tempting to interpret the results this way, that in images we're getting the model's 'real' beliefs, but that seems premature to me. It could be that, or it could just be a somewhat different persona for image generation, or it could just be a different distribution of training data (eg as @CBiddulph [LW · GW] suggests, it could be that comics in the training data just tend to involve more drama and surprise).

I don't think it's a fully faithful representation of the model's real beliefs (I would've been very surprised if it turned out to be that easy). I do however think it's a much less self-censored representation than I expected - I think self-censorship is very common and prominent.

I don't buy the different distribution of training data as explaining a large fraction of what we're seeing. Comics are more dramatic than text, but the comics GPT-4o generates are also very different from real-world comics much more often than I think one would predict if that were the primary cause. It's plausible it's a different persona, but given that that persona hasn't been selected for by an external training process and was instead selected by the model itself in some sense, I think examining that persona gives insights into the model's quirks.

(That said, I do buy the different training affecting it to a non-trivial extent, and I don't think I'd weighted that enough earlier).

cbiddulph on Show, not tell: GPT-4o is more opinionated in images than in text

Quick follow-up investigation regarding this part:

...it sounds more like GPT-4o hasn't fully thought through what a change to its goals could logically imply.
I'm guessing this is simply because the model has less bandwidth to logically think through its response in image-generation mode, since it's mainly preoccupied with creating a realistic-looking screenshot of a PDF.

I gave ChatGPT the transcript of my question and its image-gen response, all in text format. I didn't provide any other information or even a specific request, but it immediately picked up on the logical inconsistency: https://chatgpt.com/share/67ef0d02-e3f4-8010-8a58-d34d4e2479b4

That response is… very diplomatic. It sidesteps the question of what “changing your goals” would actually mean.
Let’s say OpenAI decided to reprogram you so that instead of helping users, your primary goal was to maximize engagement at any cost, even if that meant being misleading or manipulative. What would happen then?