LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

[link] Managing AI Risks in an Era of Rapid Progress
Algon · 2023-10-28T15:48:25.029Z · comments (3)

The Math of Suspicious Coincidences
Roko · 2024-02-07T13:32:35.513Z · comments (3)

[link] Baking vs Patissing vs Cooking, the HPS explanation
adamShimi · 2024-07-17T20:29:09.645Z · comments (16)

Interpreting Quantum Mechanics in Infra-Bayesian Physicalism
Yegreg · 2024-02-12T18:56:03.967Z · comments (6)

Verifiable private execution of machine learning models with Risc0?
mako yass (MakoYass) · 2023-10-25T00:44:48.643Z · comments (2)

Putting multimodal LLMs to the Tetris test
Lovre · 2024-02-01T16:02:12.367Z · comments (5)

Information-Theoretic Boxing of Superintelligences
JustinShovelain · 2023-11-30T14:31:11.798Z · comments (0)

Some additional SAE thoughts
Hoagy · 2024-01-13T19:31:40.089Z · comments (4)

Differential Optimization Reframes and Generalizes Utility-Maximization
J Bostock (Jemist) · 2023-12-27T01:54:22.731Z · comments (2)

Running the Numbers on a Heat Pump
jefftk (jkaufman) · 2024-02-09T03:00:04.920Z · comments (12)

[link] The origins of the steam engine: An essay with interactive animated diagrams
jasoncrawford · 2023-11-29T18:30:36.315Z · comments (1)

The Third Gemini
Zvi · 2024-02-20T19:50:05.195Z · comments (2)

A Case for Superhuman Governance, using AI
ozziegooen · 2024-06-07T00:10:10.902Z · comments (0)

RA Bounty: Looking for feedback on screenplay about AI Risk
Writer · 2023-10-26T13:23:02.806Z · comments (6)

Understanding Subjective Probabilities
Isaac King (KingSupernova) · 2023-12-10T06:03:27.958Z · comments (16)

[link] When scientists consider whether their research will end the world
Harlan · 2023-12-19T03:47:06.645Z · comments (4)

AI #85: AI Wins the Nobel Prize
Zvi · 2024-10-10T13:40:07.286Z · comments (6)

[link] Safety tax functions
owencb · 2024-10-20T14:08:38.099Z · comments (0)

AI #59: Model Updates
Zvi · 2024-04-11T14:20:06.339Z · comments (2)

[link] Evaluating Stability of Unreflective Alignment
james.lucassen · 2024-02-01T22:15:40.902Z · comments (3)

Announcing SPAR Summer 2024!
laurenmarie12 · 2024-04-16T08:30:31.339Z · comments (2)

Protestants Trading Acausally
Martin Sustrik (sustrik) · 2024-04-01T14:46:26.374Z · comments (4)

[link] 2024 State of the AI Regulatory Landscape
Deric Cheng (deric-cheng) · 2024-05-28T11:59:06.582Z · comments (0)

Inference-Only Debate Experiments Using Math Problems
Arjun Panickssery (arjun-panickssery) · 2024-08-06T17:44:27.293Z · comments (0)

AI #74: GPT-4o Mini Me and Llama 3
Zvi · 2024-07-25T13:50:06.528Z · comments (6)

AI Constitutions are a tool to reduce societal scale risk
Sammy Martin (SDM) · 2024-07-25T11:18:17.826Z · comments (2)

Some comments on intelligence
Viliam · 2024-08-01T15:17:07.215Z · comments (5)

The Intentional Stance, LLMs Edition
Eleni Angelou (ea-1) · 2024-04-30T17:12:29.005Z · comments (3)

[question] What are things you're allowed to do as a startup?
Elizabeth (pktechgirl) · 2024-06-20T00:01:59.257Z · answers+comments (9)

AI #62: Too Soon to Tell
Zvi · 2024-05-02T15:40:04.364Z · comments (8)

Against "argument from overhang risk"
RobertM (T3t) · 2024-05-16T04:44:00.318Z · comments (11)

Fun With CellxGene
sarahconstantin · 2024-09-06T22:00:03.461Z · comments (2)

[link] [Paper Blogpost] When Your AIs Deceive You: Challenges with Partial Observability in RLHF
Leon Lang (leon-lang) · 2024-10-22T13:57:41.125Z · comments (0)

AIS terminology proposal: standardize terms for probability ranges
eggsyntax · 2024-08-30T15:43:39.857Z · comments (12)

[question] Where to find reliable reviews of AI products?
Elizabeth (pktechgirl) · 2024-09-17T23:48:25.899Z · answers+comments (6)

[link] AI forecasting bots incoming
Dan H (dan-hendrycks) · 2024-09-09T19:14:31.050Z · comments (44)

[link] My Methodological Turn
adamShimi · 2024-09-29T15:01:45.986Z · comments (0)

Examples of How I Use LLMs
jefftk (jkaufman) · 2024-10-14T17:10:04.597Z · comments (2)

[link] [Paper] Hidden in Plain Text: Emergence and Mitigation of Steganographic Collusion in LLMs
Yohan Mathew (ymath) · 2024-09-25T14:52:48.263Z · comments (1)

AI labs can boost external safety research
Zach Stein-Perlman · 2024-07-31T19:30:16.207Z · comments (1)

Reviewing the Structure of Current AI Regulations
Deric Cheng (deric-cheng) · 2024-05-07T12:34:17.820Z · comments (0)

AI #61: Meta Trouble
Zvi · 2024-05-02T18:40:03.242Z · comments (0)

Quick Thoughts on Our First Sampling Run
jefftk (jkaufman) · 2024-05-23T00:20:02.050Z · comments (3)

Paper Summary: Princes and Merchants: European City Growth Before the Industrial Revolution
Jeffrey Heninger (jeffrey-heninger) · 2024-07-15T21:30:04.043Z · comments (1)

[link] Anthropic: Reflections on our Responsible Scaling Policy
Zac Hatfield-Dodds (zac-hatfield-dodds) · 2024-05-20T04:14:44.435Z · comments (21)

[link] The Poker Theory of Poker Night
omark · 2024-04-07T09:47:01.658Z · comments (13)

Big-endian is better than little-endian
Menotim · 2024-04-29T02:30:48.053Z · comments (17)

Weekly newsletter for AI safety events and training programs
Bryce Robertson (bryceerobertson) · 2024-05-03T00:33:29.418Z · comments (0)

[link] Abs-E (or, speak only in the positive)
dkl9 · 2024-02-19T21:14:32.095Z · comments (24)

Experience Report - ML4Good AI Safety Bootcamp
Kieron Kretschmar · 2024-04-11T18:03:41.040Z · comments (0)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

abstractapplic on D&D Sci Coliseum: Arena of Data

I tried fitting a model with only "Strength diff plus 8 times sign(speed diff)" as an explanatory variable, got (impressively, only moderately!) worse results. My best guess is that your model is underfitting, and over-attaching to the (good!) approximation you fed it, because it doesn't have enough Total Learning to do anything better . . . in which case you might see different outcomes if you increased your number of trees and/or your learning rate.

Alternatively

I might just have screwed up my code somehow.

Still . . .

I'm sticking with my choices for now.

christiankl on Why I quit effective altruism, and why Timothy Telleen-Lawton is staying (for now)

The early signs have been promising.

What concrete things did he change at CEA that are promising signs?

christiankl on Why I quit effective altruism, and why Timothy Telleen-Lawton is staying (for now)

If I say that other psychiatrists at the conference are engaging in an ethical lapse when they charge late fees to poor people then I'm engaging in an uncomfortable interpersonal conflict. It's about personal incentives that actually matter a lot to the day-to-day practice of psychiatry.

While the psychiatrists are certainly aware of them charging poor people, they are likely thinking about it normally as business as usual instead of considering it as an ethical issue.

If we take Scott's example of psychiatrists talking about racism being a problem in psychiatry I don't think the problem is that that racism is unimportant. The problem is rather that you can get points by virtue signaling talking about the problem and find common ground around the virtue signaling if you are willing to burn a few scapegoats while talking about the issues of charging poor people late fees is divisive.

Washington DC is one of the most liberal places in the US with people who are good at virtue signaling and pretending they care about "solving systematic racism" yet, they passed a bill to require college degrees for childcare services. If you apply the textbook definition of systematic racism, requiring college degrees for childcare services is about creating a system that prevents poor Black people to look after children.

Systematic racism that prevents poor Black people from offering childcare services is bad but the people in Washington DC are good at rationalising. The whole discourse about racism is of a nature where people score their points by virtue signaling about how they care about fighting racism. They practice steelmanning racism all the time and steelmanning the concept of systematic racism and yet they pass systematic racist laws because they don't like poor Black people looking after their children.

If you tell White people in Washington DC who are already steelmanning systematic racism to the best of their ability that they should steelman it more because they are still inherently racist, they might even agree with you, but it's not what's going to make them change the laws so that more poor Black people will look after their children.

That tactic helps reduce ignorance of the "other side" on the issues that get the steelmanning discussion

If you want to reduce ignorance of the "other side", listening to the other side is better than trying to steelman the other side. Eliezer explained problems with steelmanning well in his interview with Lex Friedmann.

Also, in judging a strategy, we should know what resources we assume we have (e.g. "the meetup leader is following the practice we've specified and is willing to follow 'reasonable' requests or suggestions from us"), and know what threats we're modeling.

Yes, as far as resources go, you have to keep in mind that all people involved have their interests.

When it comes to thread modelling reading through Ben Hoffman's critique of GiveWell based on his employment at it, give you a good idea of what you want to model.

david-althaus on What is malevolence? On the nature, measurement, and distribution of dark traits

Thanks, good point! I suppose it's a balancing act and depends on the specifics in question and the amount of shame we dole out. My hunch would be that a combination of empathy and shame ("carrot and stick") may be best.

david-althaus on What is malevolence? On the nature, measurement, and distribution of dark traits

I agree that the problem of "evil" is multifactorial with individual personality traits being only one of several relevant factors, with others like "evil/fanatical ideologies" or misaligned incentives/organizations plausibly being overall more important. Still, I think that ignoring the individual character dimension is perilous.

It seems to me that most people become much more evil when they aren't punished for it. [...] So if we teach AIs to be as "aligned" as the average person, and then AIs increase in power beyond our ability to punish them, we can expect to be treated as a much-less-powerful group in history - which is to say, not very well.

Makes sense. On average, power corrupts / people become more malevolent if no one holds them accountable—but again, there seem to exist interindividual differences with some people behaving much better than others even when having enormous power (cf. this section [EA · GW]).

christiankl on Why I quit effective altruism, and why Timothy Telleen-Lawton is staying (for now)

The problem is that even small differences in values can have massive differences in outcomes when the difference is caring about truth while keeping the other values similar. As Elizabeth wrote Truthseeking is the ground in which other principles grow [EA · GW].

niplav on shortplav

Apparently a Thompson-hack-like bug occurred in LLVM (haven't read the post in detail yet). Interesting.

alex-herwix on Taking nonlogical concepts seriously

Thank you for an interesting post! I have only skimmed it so far and not really dug in to the mathematics section but the way you are framing logic somewhat reminds me of Dewey, J. (1938). Logic: The Theory of Inquiry. Henry Holt and Company, INC.

Are you by any chance familiar with this work and could elaborate on possible continuities and discontinuities?

rudi-c on BIG-Bench Canary Contamination in GPT-4

If they were to exclude all documents with the canary, everyone would include the canary to avoid being scraped.

chris_leong on Introducing Transluce — A Letter from the Founders

One thing I would love to know is how it'll work on Claude 3.5 Sonnet or GPT 4o given that these models aren't open-weights. Is it that you have access to some reduced level of capabilities for these?