LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

GPT-o1
Zvi · 2024-09-16T13:40:06.236Z · comments (27)

Corrigibility = Tool-ness?
johnswentworth · 2024-06-28T01:19:48.883Z · comments (8)

How I started believing religion might actually matter for rationality and moral philosophy
zhukeepa · 2024-08-23T17:40:47.341Z · comments (18)

Secondary forces of debt
KatjaGrace · 2024-06-27T21:10:06.131Z · comments (18)

Value fragility and AI takeover
Joe Carlsmith (joekc) · 2024-08-05T21:28:07.306Z · comments (5)

On the CrowdStrike Incident
Zvi · 2024-07-22T12:40:05.894Z · comments (14)

A Simple Toy Coherence Theorem
johnswentworth · 2024-08-02T17:47:50.642Z · comments (15)

Darwinian Traps and Existential Risks
KristianRonn · 2024-08-25T22:37:14.142Z · comments (14)

[link] Not every accommodation is a Curb Cut Effect: The Handicapped Parking Effect, the Clapper Effect, and more
Michael Cohn (michael-cohn) · 2024-09-15T05:27:36.691Z · comments (32)

In Defense of Open-Minded UDT
abramdemski · 2024-08-12T18:27:36.220Z · comments (27)

[link] Soft Nationalization: how the USG will control AI labs
Deric Cheng (deric-cheng) · 2024-08-27T15:11:14.601Z · comments (7)

[link] LK-99 in retrospect
bhauth · 2024-07-07T02:06:27.660Z · comments (21)

Interpreting Preference Models w/ Sparse Autoencoders
Logan Riggs (elriggs) · 2024-07-01T21:35:40.603Z · comments (12)

[link] Excerpts from "A Reader's Manifesto"
Arjun Panickssery (arjun-panickssery) · 2024-09-06T22:37:40.254Z · comments (1)

AI for Bio: State Of The Field
sarahconstantin · 2024-08-30T18:00:02.187Z · comments (2)

FarmKind's Illusory Offer
jefftk (jkaufman) · 2024-08-09T11:30:07.082Z · comments (5)

Guide to SB 1047
Zvi · 2024-08-20T13:10:07.408Z · comments (18)

[link] Yoshua Bengio: Reasoning through arguments against taking AI safety seriously
Judd Rosenblatt (judd) · 2024-07-11T23:53:17.187Z · comments (3)

D&D.Sci Scenario Index
aphyer · 2024-07-23T02:00:43.483Z · comments (0)

Adam Optimizer Causes Privileged Basis in Transformer LM Residual Stream
Diego Caples (diego-caples) · 2024-09-06T17:55:34.265Z · comments (7)

Multiplex Gene Editing: Where Are We Now?
sarahconstantin · 2024-07-16T20:50:04.590Z · comments (6)

Secular interpretations of core perennialist claims
zhukeepa · 2024-08-25T23:41:02.683Z · comments (30)

If we solve alignment, do we die anyway?
Seth Herd · 2024-08-23T13:13:10.933Z · comments (65)

Why Large Bureaucratic Organizations?
johnswentworth · 2024-08-27T18:30:07.422Z · comments (51)

Estimating Tail Risk in Neural Networks
Mark Xu (mark-xu) · 2024-09-13T20:00:06.921Z · comments (4)

[link] GPT-4o System Card
Zach Stein-Perlman · 2024-08-08T20:30:52.633Z · comments (11)

AI #79: Ready for Some Football
Zvi · 2024-08-29T13:30:10.902Z · comments (16)

The Hessian rank bounds the learning coefficient
Lucius Bushnaq (Lblack) · 2024-08-08T20:55:36.960Z · comments (9)

Timaeus is hiring!
Jesse Hoogland (jhoogland) · 2024-07-12T23:42:28.651Z · comments (4)

[link] Open Source Automated Interpretability for Sparse Autoencoder Features
kh4dien · 2024-07-30T21:11:36.866Z · comments (1)

Brief notes on the Wikipedia game
Olli Järviniemi (jarviniemi) · 2024-07-14T02:28:22.473Z · comments (9)

An AI Race With China Can Be Better Than Not Racing
niplav · 2024-07-02T17:57:36.976Z · comments (31)

Indecision and internalized authority figures
Kaj_Sotala · 2024-07-06T10:10:02.528Z · comments (1)

[link] The economics of space tethers
harsimony · 2024-08-22T16:15:22.699Z · comments (22)

What and Why: Developmental Interpretability of Reinforcement Learning
Garrett Baker (D0TheMath) · 2024-07-09T14:09:40.649Z · comments (3)

How a chip is designed
YM (Yannick_Muehlhaeuser_duplicate0.05902100825326273) · 2024-06-28T08:04:27.392Z · comments (4)

Friendship is transactional, unconditional friendship is insurance
Ruby · 2024-07-17T22:52:41.967Z · comments (24)

Fear of centralized power vs. fear of misaligned AGI: Vitalik Buterin on 80,000 Hours
Seth Herd · 2024-08-05T15:38:09.682Z · comments (20)

[link] Static Analysis As A Lifestyle
adamShimi · 2024-07-03T18:29:37.384Z · comments (11)

Advice to junior AI governance researchers
Akash (akash-wasil) · 2024-07-08T19:19:07.316Z · comments (1)

Analyzing DeepMind's Probabilistic Methods for Evaluating Agent Capabilities
Axel Højmark (hojmax) · 2024-07-22T16:17:07.665Z · comments (0)

[link] A primer on why computational predictive toxicology is hard
Abhishaike Mahajan (abhishaike-mahajan) · 2024-08-19T17:16:37.735Z · comments (2)

RTFB: California’s AB 3211
Zvi · 2024-07-30T13:10:03.853Z · comments (2)

A framework for thinking about AI power-seeking
Joe Carlsmith (joekc) · 2024-07-24T22:41:01.685Z · comments (15)

[link] Ice: The Penultimate Frontier
Roko · 2024-07-13T23:44:56.827Z · comments (56)

[link] Twitter thread on AI safety evals
Richard_Ngo (ricraz) · 2024-07-31T00:18:14.076Z · comments (3)

[link] Outrage Bonding
Jonathan Moregård (JonathanMoregard) · 2024-08-09T13:46:59.818Z · comments (12)

Inspired by: Failures in Kindness
X4vier · 2024-07-27T01:21:42.848Z · comments (2)

[Interim research report] Activation plateaus & sensitive directions in GPT2
StefanHex (Stefan42) · 2024-07-05T17:05:25.631Z · comments (2)

AI #78: Some Welcome Calm
Zvi · 2024-08-22T14:20:10.812Z · comments (15)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

tsvibt on Why I funded PIBBSS

Reminder that you have a moral obligation, every single time you're communicating an overall justification of alignment work premised on slow takeoff, in a context where you can spare two sentences without unreasonable cost, to say out loud something to the effect of "Oh and by the way, just so you know, the causal reason I'm talking about this work is that it seems tractable, and the causal reason is not that this work matters.". If you don't, you're spraying your [slipping sideways out of reality] on everyone else.

yanni-kyriacos on yanni's Shortform

I beta tested a new movement building format last night: online networking. It seems to have legs.

V quick theory of change:
> problem to solves: not enough people in AIS across Australia (especially) and New Zealand are meeting each other (this is bad for the movement and people's impact).
> we need to brute force serendipity to create collabs.
> this initiative has v low cost

quantitative results:
> I purposefully didn't market it hard because it was a beta. I literally got more people that I hoped for
> 22 RSVPs and 18 attendees
> this says to me I could easily get 40+
> average score for below question was 7.27, which is very good for a beta test

I used Zoom, which was extremely clunky. These results suggest to me I should;
> invest software designed for this use case, not zoom

> segment by career stream (governance vs technical) and/or experience (beginner vs advanced)
> run it every second month

I have heaps of qualitative feedback from participants but don't have time to share it here.

Email me if interested: yanni@aisafetyanz.com.au

matthew-barnett on In defense of technological unemployment as the main AI concern

I mean like a dozen people have now had long comment threads with you about this. I doubt this one is going to cross this seemingly large inferential gap.

I think it's still useful to ask for concise reasons for certain beliefs. "The Fundamental Question of Rationality is: "Why do you believe what you believe?"".

Your reasons could be different from the reasons other people give, and indeed, some of your reasons seem to be different from what I've heard from many others.

The short answer is that from the perspective of AI it really sucks to have basically all property be owned by humans

For what it's worth, I don't think humans need to own basically all property in order for AIs to obey property rights. A few alternatives come to mind: humans could have a minority share of the wealth, and AIs could have property rights with each other.

j_thomas_moros on How to choose what to work on

Thanks for the summary of various models of how to figure out what to work on. While reading it, I couldn't help but focus on my frustration about the "getting paid for it" part. Personally, I want to create a new programming language. I think we are still in the dark age of computer programming and that programming languages suck. I can't make a perfect language, but I can take a solid step in the right direction. The world could sure use a better programming language if you ask me. I'm passionate about this project. I'm a skilled software developer with a longer career than all the young guns I see. I think I've proved with my work so far that I am a top-tier language designer capable of writing a compiler and standard library. But...... this is almost the definition of something you can't and won't be paid for. At least not until you've already published a successful language. That fact greatly contributes to why we can't have better programming languages. No one can afford to let them incubate as long as needed. Because of limited resources, everyone has to push to release it as fast as possible. Unlike other software, languages have very strict backward compatibility requirements, so improving them is a challenge and inevitably leads to real issues as the language grows over time. However, they can never fix previous mistakes or address design changes needed to support new features.

ruby on Which LessWrong/Alignment topics would you like to be tutored in? [Poll]

Anthropics [? · GW]

jchan on Inquisitive vs. adversarial rationality

This can be a great time-saver because it relies on each party to present the best possible case for their side. This means I don't have to do any evidence-gathering myself; I just need to evaluate the arguments presented, with that heuristic in mind. For example, if the pro-X side cites a bunch of sources in favor of X, but I look into them and find them unconvincing, then this is pretty good evidence against X, and I don't have to go combing through all the other sources myself. The mere existence of bad arguments for X is not in itself [LW · GW] evidence against X, but the fact that they're presented as the best possible arguments is.

Of course the problem is, outside of a legal proceeding, parties rarely have that strong an incentive to dig up the best possible arguments. Their time is limited as well, and they don't really suffer much consequence from failing to convince you. Also, the discussion medium might structurally impede the best arguments from being given (e.g. replies in a Twitter thread need to be posted quickly or else nobody will see them). Or worse yet, a skilled propaganda campaign can flood the zone with bad pro-X arguments from personages who appear to be pro-X but are secretly against it, knowing that the audience is going to be evaluating these arguments using the adversarial heuristic.

ruby on Which LessWrong/Alignment topics would you like to be tutored in? [Poll]

Decision Theory [? · GW]

ruby on Which LessWrong/Alignment topics would you like to be tutored in? [Poll]

Agent Foundations [? · GW]

ruby on Which LessWrong/Alignment topics would you like to be tutored in? [Poll]

Natural Latents [LW · GW]

ruby on Which LessWrong/Alignment topics would you like to be tutored in? [Poll]

Infra-Bayesianism [? · GW]