LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

[link] My MATS Summer 2023 experience
James Chua (james-chua) · 2024-03-20T11:26:14.944Z · comments (0)

Throughput vs. Latency
alkjash · 2024-01-12T21:37:07.632Z · comments (2)

Wholesome Culture
owencb · 2024-03-01T12:08:17.877Z · comments (3)

End-to-end hacking with language models
tchauvin (timot.cool) · 2024-04-05T15:06:53.689Z · comments (0)

Non-myopia stories
lberglund (brglnd) · 2023-11-13T17:52:31.933Z · comments (10)

Quick Thoughts on Our First Sampling Run
jefftk (jkaufman) · 2024-05-23T00:20:02.050Z · comments (3)

[link] What fuels your ambition?
Cissy · 2024-01-31T18:30:53.274Z · comments (1)

[question] What Other Lines of Work are Safe from AI Automation?
RogerDearnaley (roger-d-1) · 2024-07-11T10:01:12.616Z · answers+comments (35)

I played the AI box game as the Gatekeeper — and lost
datawitch · 2024-02-12T18:39:35.777Z · comments (52)

[question] Weighing reputational and moral consequences of leaving Russia or staying
spza · 2024-02-18T19:36:40.676Z · answers+comments (24)

[link] Abs-E (or, speak only in the positive)
dkl9 · 2024-02-19T21:14:32.095Z · comments (24)

AI labs can boost external safety research
Zach Stein-Perlman · 2024-07-31T19:30:16.207Z · comments (1)

DPO/PPO-RLHF on LLMs incentivizes sycophancy, exaggeration and deceptive hallucination, but not misaligned powerseeking
tailcalled · 2024-06-10T21:20:11.938Z · comments (13)

[LDSL#4] Root cause analysis versus effect size estimation
tailcalled · 2024-08-11T16:12:14.604Z · comments (0)

Aggregative Principles of Social Justice
Cleo Nardo (strawberry calm) · 2024-06-05T13:44:47.499Z · comments (10)

[link] The Poker Theory of Poker Night
omark · 2024-04-07T09:47:01.658Z · comments (13)

[link] Anthropic: Reflections on our Responsible Scaling Policy
Zac Hatfield-Dodds (zac-hatfield-dodds) · 2024-05-20T04:14:44.435Z · comments (21)

Reviewing the Structure of Current AI Regulations
Deric Cheng (deric-cheng) · 2024-05-07T12:34:17.820Z · comments (0)

Results from the Turing Seminar hackathon
Charbel-Raphaël (charbel-raphael-segerie) · 2023-12-07T14:50:38.377Z · comments (1)

Experience Report - ML4Good AI Safety Bootcamp
Kieron Kretschmar · 2024-04-11T18:03:41.040Z · comments (0)

Big-endian is better than little-endian
Menotim · 2024-04-29T02:30:48.053Z · comments (17)

“Clean” vs. “messy” goal-directedness (Section 2.2.3 of “Scheming AIs”)
Joe Carlsmith (joekc) · 2023-11-29T16:32:30.068Z · comments (1)

Deception Chess: Game #2
Zane · 2023-11-29T02:43:22.375Z · comments (17)

Paper Summary: Princes and Merchants: European City Growth Before the Industrial Revolution
Jeffrey Heninger (jeffrey-heninger) · 2024-07-15T21:30:04.043Z · comments (1)

[link] My Methodological Turn
adamShimi · 2024-09-29T15:01:45.986Z · comments (0)

What is malevolence? On the nature, measurement, and distribution of dark traits
David Althaus (wallowinmaya) · 2024-10-23T08:41:33.197Z · comments (6)

Ackshually, many worlds is wrong
tailcalled · 2024-04-11T20:23:59.416Z · comments (42)

Childhood and Education Roundup #6: College Edition
Zvi · 2024-06-26T11:40:03.990Z · comments (8)

An explanation of evil in an organized world
KatjaGrace · 2024-05-02T05:20:06.240Z · comments (9)

{Book Summary} The Art of Gathering
Tristan Williams (tristan-williams) · 2024-04-16T10:48:41.528Z · comments (0)

Updates to Open Phil’s career development and transition funding program
abergal · 2023-12-04T18:10:29.394Z · comments (0)

Reading More Each Day: A Simple $35 Tool
aysajan · 2024-07-24T13:54:04.290Z · comments (2)

Monthly Roundup #19: June 2024
Zvi · 2024-06-25T12:00:03.333Z · comments (9)

Auditing LMs with counterfactual search: a tool for control and ELK
Jacob Pfau (jacob-pfau) · 2024-02-20T00:02:09.575Z · comments (6)

AI #64: Feel the Mundane Utility
Zvi · 2024-05-16T15:20:02.956Z · comments (11)

DIY RLHF: A simple implementation for hands on experience
Mike Vaiana (mike-vaiana) · 2024-07-10T12:07:03.047Z · comments (0)

Experiments with an alternative method to promote sparsity in sparse autoencoders
Eoin Farrell · 2024-04-15T18:21:48.771Z · comments (7)

Cryonics p(success) estimates are only weakly associated with interest in pursuing cryonics in the LW 2023 Survey
Andy_McKenzie · 2024-02-29T14:47:28.613Z · comments (6)

[link] New blog: Expedition to the Far Lands
Connor Leahy (NPCollapse) · 2024-08-17T11:07:48.537Z · comments (3)

[link] Conversation Visualizer
ethanmorse · 2023-12-31T01:18:01.424Z · comments (4)

[link] Cellular reprogramming, pneumatic launch systems, and terraforming Mars: Some things I learned about at Foresight Vision Weekend
jasoncrawford · 2024-01-04T19:33:57.887Z · comments (0)

Collection (Part 6 of "The Sense Of Physical Necessity")
LoganStrohl (BrienneYudkowsky) · 2024-03-14T21:37:00.160Z · comments (0)

[link] Quick Thoughts on Scaling Monosemanticity
Joel Burget (joel-burget) · 2024-05-23T16:22:48.035Z · comments (1)

AI #65: I Spy With My AI
Zvi · 2024-05-23T12:40:02.793Z · comments (7)

Employee Incentives Make AGI Lab Pauses More Costly
nikola (nikolaisalreadytaken) · 2023-12-22T05:04:15.598Z · comments (12)

Deconfusing “ontology” in AI alignment
Dylan Bowman (dylan-bowman) · 2023-11-08T20:03:43.205Z · comments (3)

Cicadas, Anthropic, and the bilateral alignment problem
kromem · 2024-05-22T11:09:56.469Z · comments (6)

Aggregative principles approximate utilitarian principles
Cleo Nardo (strawberry calm) · 2024-06-12T16:27:22.179Z · comments (3)

[link] Memo on some neglected topics
Lukas Finnveden (Lanrian) · 2023-11-11T02:01:55.834Z · comments (2)

Evaporation of improvements
Viliam · 2024-06-20T18:34:40.969Z · comments (27)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

christiankl on Why I quit effective altruism, and why Timothy Telleen-Lawton is staying (for now)

Personally, I read approximately everything you (Elizabeth) write on the Forum and LW, and occasionally cite it to others in EA leadership world. That's why I'm pretty sure your work has had nontrivial impact. I am not too surprised that its impact hasn't become apparent to you though.
[...]
I don't see solutions or great ways forward yet, and I sense that nobody really does

That does sound like learned helplessness and that the EA leadership filters people out who would see ways forward.

Let me give you one:

If people in EA would consider her critiques to have real value, then the obvious step is to give Elizabeth money to write more. Given that she has a Patreon the way to give her money is pretty straightforward. If the writing influences what happens in EV board discussions, paying Elizabeth for the value she provides for the board would be straightforward.

If she would get paid decently, I would expect she would feel she's making an impact.

Paying Elizabeth might not be the solution to all of EA's problems, but it's a way to signal priorities. Estimate the value she provides to EA and then pay her for that value and publically publish as EV a writeup that EV thinks that this is the amount of value she provides to EA and was paid by EV.

fabien-roger on the case for CoT unfaithfulness is overstated

I like this post. I made similar observations and arguments in The Translucent Thoughts Hypotheses and Their Implications [LW · GW] but these were less clear and buried at the end of the post.

simonm on Derivative AT a discontinuity

Your definition of the Heaviside step function has H(0) = 1.
Your definition of L has L(0) = 1/2, so you're not really taking the derivative of the same function.

I don't really believe nonstandard analysis helps us differentiate the Heaviside step function. You have found a function that is quite a lot like the step function and shown that it has a derivative (maybe), but I would need to be convinced that all functions have the same derivative to be convinced that something meaningful is going on. (And since all your derivatives have different values, this seems like a not useful definition of a derivative)

abstractapplic on D&D Sci Coliseum: Arena of Data

I tried fitting a model with only "Strength diff plus 8 times sign(speed diff)" as an explanatory variable, got (impressively, only moderately!) worse results. My best guess is that your model is underfitting, and over-attaching to the (good!) approximation you fed it, because it doesn't have enough Total Learning to do anything better . . . in which case you might see different outcomes if you increased your number of trees and/or your learning rate.

Alternatively

I might just have screwed up my code somehow.

Still . . .

I'm sticking with my choices for now.

christiankl on Why I quit effective altruism, and why Timothy Telleen-Lawton is staying (for now)

The early signs have been promising.

What concrete things did he change at CEA that are promising signs?

christiankl on Why I quit effective altruism, and why Timothy Telleen-Lawton is staying (for now)

If I say that other psychiatrists at the conference are engaging in an ethical lapse when they charge late fees to poor people then I'm engaging in an uncomfortable interpersonal conflict. It's about personal incentives that actually matter a lot to the day-to-day practice of psychiatry.

While the psychiatrists are certainly aware of them charging poor people, they are likely thinking about it normally as business as usual instead of considering it as an ethical issue.

If we take Scott's example of psychiatrists talking about racism being a problem in psychiatry I don't think the problem is that that racism is unimportant. The problem is rather that you can get points by virtue signaling talking about the problem and find common ground around the virtue signaling if you are willing to burn a few scapegoats while talking about the issues of charging poor people late fees is divisive.

Washington DC is one of the most liberal places in the US with people who are good at virtue signaling and pretending they care about "solving systematic racism" yet, they passed a bill to require college degrees for childcare services. If you apply the textbook definition of systematic racism, requiring college degrees for childcare services is about creating a system that prevents poor Black people to look after children.

Systematic racism that prevents poor Black people from offering childcare services is bad but the people in Washington DC are good at rationalising. The whole discourse about racism is of a nature where people score their points by virtue signaling about how they care about fighting racism. They practice steelmanning racism all the time and steelmanning the concept of systematic racism and yet they pass systematic racist laws because they don't like poor Black people looking after their children.

If you tell White people in Washington DC who are already steelmanning systematic racism to the best of their ability that they should steelman it more because they are still inherently racist, they might even agree with you, but it's not what's going to make them change the laws so that more poor Black people will look after their children.

That tactic helps reduce ignorance of the "other side" on the issues that get the steelmanning discussion

If you want to reduce ignorance of the "other side", listening to the other side is better than trying to steelman the other side. Eliezer explained problems with steelmanning well in his interview with Lex Friedmann.

Also, in judging a strategy, we should know what resources we assume we have (e.g. "the meetup leader is following the practice we've specified and is willing to follow 'reasonable' requests or suggestions from us"), and know what threats we're modeling.

Yes, as far as resources go, you have to keep in mind that all people involved have their interests.

When it comes to thread modelling reading through Ben Hoffman's critique of GiveWell based on his employment at it, give you a good idea of what you want to model.

david-althaus on What is malevolence? On the nature, measurement, and distribution of dark traits

Thanks, good point! I suppose it's a balancing act and depends on the specifics in question and the amount of shame we dole out. My hunch would be that a combination of empathy and shame ("carrot and stick") may be best.

david-althaus on What is malevolence? On the nature, measurement, and distribution of dark traits

I agree that the problem of "evil" is multifactorial with individual personality traits being only one of several relevant factors, with others like "evil/fanatical ideologies" or misaligned incentives/organizations plausibly being overall more important. Still, I think that ignoring the individual character dimension is perilous.

It seems to me that most people become much more evil when they aren't punished for it. [...] So if we teach AIs to be as "aligned" as the average person, and then AIs increase in power beyond our ability to punish them, we can expect to be treated as a much-less-powerful group in history - which is to say, not very well.

Makes sense. On average, power corrupts / people become more malevolent if no one holds them accountable—but again, there seem to exist interindividual differences with some people behaving much better than others even when having enormous power (cf. this section [EA · GW]).

christiankl on Why I quit effective altruism, and why Timothy Telleen-Lawton is staying (for now)

The problem is that even small differences in values can have massive differences in outcomes when the difference is caring about truth while keeping the other values similar. As Elizabeth wrote Truthseeking is the ground in which other principles grow [EA · GW].

niplav on shortplav

Apparently a Thompson-hack-like bug occurred in LLVM (haven't read the post in detail yet). Interesting.