LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

In defense of technological unemployment as the main AI concern
tailcalled · 2024-08-27T17:58:01.992Z · comments (36)

How To Do Patching Fast
Joseph Miller (Josephm) · 2024-05-11T20:13:52.424Z · comments (6)

[link] Literacy Rates Haven't Fallen By 20% Since the Department of Education Was Created
Maxwell Tabarrok (maxwell-tabarrok) · 2024-11-22T20:53:59.007Z · comments (0)

Start an Upper-Room UV Installation Company?
jefftk (jkaufman) · 2024-10-19T02:00:10.691Z · comments (9)

Ambiguity in Prediction Market Resolution is Still Harmful
aphyer · 2024-07-31T20:32:40.217Z · comments (17)

[link] Began a pay-on-results coaching experiment, made $40,300 since July
Chipmonk · 2024-12-29T21:12:02.574Z · comments (15)

[question] What's the Right Way to think about Information Theoretic quantities in Neural Networks?
Dalcy (Darcy) · 2025-01-19T08:04:30.236Z · answers+comments (13)

Games for AI Control
charlie_griffin (cjgriffin) · 2024-07-11T18:40:50.607Z · comments (0)

Sci-Fi books micro-reviews
Yair Halberstadt (yair-halberstadt) · 2024-06-24T09:49:28.523Z · comments (27)

The Case for Predictive Models
Rubi J. Hudson (Rubi) · 2024-04-03T18:22:20.243Z · comments (7)

Claude's Constitutional Consequentialism?
1a3orn · 2024-12-19T19:53:33.254Z · comments (6)

Which LessWrong/Alignment topics would you like to be tutored in? [Poll]
Ruby · 2024-09-19T01:35:02.999Z · comments (12)

Understanding Positional Features in Layer 0 SAEs
bilalchughtai (beelal) · 2024-07-29T09:36:40.701Z · comments (0)

Dream, Truth, & Good
abramdemski · 2025-02-24T16:59:05.045Z · comments (8)

[link] Review: Good Strategy, Bad Strategy
L Rudolf L (LRudL) · 2024-12-21T17:17:04.342Z · comments (0)

Practicing Bayesian Epistemology with "Two Boys" Probability Puzzles
Liron · 2025-01-02T04:42:20.362Z · comments (14)

Concrete empirical research projects in mechanistic anomaly detection
Erik Jenner (ejenner) · 2024-04-03T23:07:21.502Z · comments (3)

MATS mentor selection
DanielFilan · 2025-01-10T03:12:52.141Z · comments (11)

Evolution and the Low Road to Nash
Aydin Mohseni (aydin-mohseni) · 2025-01-22T07:06:32.305Z · comments (2)

Dmitry's Koan
Dmitry Vaintrob (dmitry-vaintrob) · 2025-01-10T04:27:30.346Z · comments (8)

Examining Language Model Performance with Reconstructed Activations using Sparse Autoencoders
Evan Anders (evan-anders) · 2024-02-27T02:43:22.446Z · comments (16)

The need for multi-agent experiments
Martín Soto (martinsq) · 2024-08-01T17:14:16.590Z · comments (3)

Zvi’s 2024 In Movies
Zvi · 2025-01-13T13:40:05.488Z · comments (4)

New Executive Team & Board — PIBBSS
Nora_Ammann · 2024-07-01T19:30:45.261Z · comments (1)

[question] Does reducing the amount of RL for a given capability level make AI safer?
Chris_Leong · 2024-05-05T17:04:01.799Z · answers+comments (22)

Locating My Eyes (Part 3 of "The Sense of Physical Necessity")
LoganStrohl (BrienneYudkowsky) · 2024-02-29T03:09:25.810Z · comments (4)

DunCon @Lighthaven
Duncan Sabien (Deactivated) (Duncan_Sabien) · 2024-09-29T04:56:27.205Z · comments (2)

[link] Alignment Is Not All You Need
Adam Jones (domdomegg) · 2025-01-02T17:50:00.486Z · comments (10)

ARENA 4.0 Impact Report
Chloe Li (chloe-li-1) · 2024-11-27T20:51:54.844Z · comments (3)

[link] Post series on "Liability Law for reducing Existential Risk from AI"
Nora_Ammann · 2024-02-29T04:39:50.557Z · comments (1)

Housing Roundup #7
Zvi · 2024-03-04T15:00:08.192Z · comments (1)

How I internalized my achievements to better deal with negative feelings
Raymond Koopmanschap · 2024-02-27T15:10:24.149Z · comments (7)

[link] We Need Major, But Not Radical, FDA Reform
Maxwell Tabarrok (maxwell-tabarrok) · 2024-02-24T16:54:33.061Z · comments (12)

[link] An Interactive Shapley Value Explainer
James Stephen Brown (james-brown) · 2024-09-28T05:01:21.169Z · comments (9)

Formalizing the Informal (event invite)
abramdemski · 2024-09-10T19:22:53.564Z · comments (0)

5 ways to improve CoT faithfulness
CBiddulph (caleb-biddulph) · 2024-10-05T20:17:12.637Z · comments (40)

Gated Attention Blocks: Preliminary Progress toward Removing Attention Head Superposition
cmathw · 2024-04-08T11:14:43.268Z · comments (4)

Open Problems in AIXI Agent Foundations
Cole Wyeth (Amyr) · 2024-09-12T15:38:59.007Z · comments (2)

Evidential Cooperation in Large Worlds: Potential Objections & FAQ
Chi Nguyen · 2024-02-28T18:58:25.688Z · comments (5)

[link] Rowing vs steering
Saul Munn (saul-munn) · 2024-08-10T07:00:17.594Z · comments (2)

Deep and obvious points in the gap between your thoughts and your pictures of thought
KatjaGrace · 2024-02-23T07:30:07.461Z · comments (6)

List your AI X-Risk cruxes!
Aryeh Englander (alenglander) · 2024-04-28T18:26:19.327Z · comments (7)

[link] Soviet comedy film recommendations
Nina Panickssery (NinaR) · 2024-06-09T23:40:58.536Z · comments (11)

Wholesomeness and Effective Altruism
owencb · 2024-02-28T20:28:22.175Z · comments (3)

AI #89: Trump Card
Zvi · 2024-11-07T16:30:05.684Z · comments (12)

When fine-tuning fails to elicit GPT-3.5's chess abilities
Theodore Chapman · 2024-06-14T18:50:52.855Z · comments (3)

Deep Learning is cheap Solomonoff induction?
Lucius Bushnaq (Lblack) · 2024-12-07T11:00:56.455Z · comments (1)

A Robust Natural Latent Over A Mixed Distribution Is Natural Over The Distributions Which Were Mixed
johnswentworth · 2024-08-22T19:19:28.940Z · comments (4)

Startup Success Rates Are So Low Because the Rewards Are So Large
AppliedDivinityStudies (kohaku-none) · 2024-10-10T20:22:01.557Z · comments (6)

Causal inference for the home gardener
braces · 2024-11-27T17:55:52.629Z · comments (1)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

ryan_greenblatt on Training AI to do alignment research we don’t already know how to do

I certainly agree it isn't clear, just my current best guess.

jeremy-gillen on Training AI to do alignment research we don’t already know how to do

It's not entirely clear to me that the math works out for AIs being helpful on net relative to humans just doing it, because of the supervision required, and the trust and misalignment issues.

But on this question (for AIs that are just capable of "prosaic and relatively unenlightened ML research") it feels like shot-in-the-dark guesses. It's very unclear to me what is and isn't possible.

angela-pretorius on what an efficient market feels from inside

Some landlords offer cheap rent but only rent to childless non-smokers who have a perfect credit score and a well-paid job (preferably a job that requires security clearance or an enhanced DBS check) and whose demographic characteristics are not too similar to those of their previous bad tenants.

Other landlords are less selective about who they rent to but charge much higher rents to compensate for the risk of property damage and rent arrears. This is why properties that are very similar in quality may be charging at wildly different rents.

ryan_greenblatt on Training AI to do alignment research we don’t already know how to do

On some axes, but won't there to be axes where AIs are more difficult than humans also? Sycophancy&slop being the most salient. Misalignment issues being another.

Yes, I just meant on net. (Relative to the current ML community and given a similar fraction of resources to spend on AI compute.)

nikola-jurkovic on nikola's Shortform

All of the above but it seems pretty hard to have an impact as a high schooler, and many impact avenues aren't technically "positions" (e.g. influencer)
I think that everyone expect "Extremely resilient individuals who expect to get an impactful position (including independent research) very quickly" is probably better off following the strategy.

lsusr on Luna Lovegood and the Chamber of Secrets - Part 10

That's a good point. I've changed it to "wokking".

samuelshadrach on xpostah's Shortform

If I understand correctly it is possible to find $300/mo/bedroom accommodation in rural US today, and a large enough city will compress city rents down to rural rents. A govt willing to pursue a plan as interesting as this one may also be able to increase immigrant labour to build the houses and relax housing regulations. US residential rents are artificially high compared to global average. (In some parts of the world, a few steel sheets (4 walls + roof) is sufficient to count as a house, even water and sewage piping in every house is not mandatory as long as residents can access toilets and water supply within walking distance.)

(A gigacity could also increase rents because it'll increase the incomes of even its lowest income members. But yeah in general now you need to track median incomes of 1B people to find out new equilibrium.)

ricraz on nikola's Shortform

I found this tricky to parse because of two phrasing issues:

The post depends a lot on what you mean by "school" (high school versus undergrad).
I feel confused about what claim you're making about the waiting room strategy: you say that some people shouldn't use it, but you don't actually claim that anyone in particular should use it. So are you just mentioning that it's a possible strategy? Or are you implying that it should be the default strategy?

jeremy-gillen on Training AI to do alignment research we don’t already know how to do

Thanks, I appreciate the draft. I see why it's not plausible to get started on now, since much of it depends on having AGIs or proto-AGIs to play with.

I guess I shouldn't respond too much in public until you've published the doc, but:

If I'm interpreting correctly, a number of the things you intend to try involve having a misaligned (but controlled) proto-AGI run experiments involving training (or otherwise messing with in some way) an AGI. I hope you have some empathy the internal screaming I have toward this category of things.
A bunch of the ideas do seem reasonable to want to try (given that you had AGIs to play with, and were very confident that doing so wouldn't allow them to escape or otherwise gain influence). I am sympathetic to the various ideas that involve gaining understanding of how to influence goals better by training in various ways.
There are chunks of these ideas that definitely aren't "prosaic and relatively unenlightened ML research", and involve very-high-trust security stuff or non-trivial epistemic work.
I'd be a little more sympathetic to these kinda desperate last-minute things if I had no hope in literally just understanding how to build task-AGI properly, in a well understood way. We can do this now. I'm baffled that almost all of the EA-alignment-sphere has given up on even trying to do this. From talking to people this weekend this shift seems downstream of thinking that we can make AGIs do alignment work, without thinking this through in detail.

The total quantity of risk reduction is unclear, but seems substantial to me. I'd guess takeover risk goes from 50% to 5% if you do a very good job at executing on huge amounts of prosaic and relatively unenlightened ML research at the relevant time

Agree it's unclear. I think the chance of most of the ideas being helpful depends on some variables that we don't clearly know yet. I think 90% risk improvement can't be right, because there's a lot of correlation between each of the things working or failing. And a lot of the risk comes from imperfect execution of the control scheme, which adds on top.

One underlying intuition that I want to express: The world where we are making proto-AGIs run all these experiments is pure chaos. Politically and epistemically and with all the work we need to do. I think pushing toward this chaotic world is much worse than other worlds we could push for right now.

But if I thought control was likely to work very well and saw a much more plausible path to alignment among the "stuff to try", I'd think it was a reasonable strategy.

I also think that getting the ML community to work on things effectively is probably substantially harder than getting AIs to work on things effectively

On some axes, but won't there to be axes where AIs are more difficult than humans also? Sycophancy&slop being the most salient. Misalignment issues being another.

This work isn't extremely easy to verify or scale up (such that I don't think "throw a billion dollars at it" just works),

This makes sense now. But I think this line should make you worry about whether you can make controlled AIs do it.

campbell-hutcheson-1 on Campbell Hutcheson's Shortform

The Suchir Balaji Autopsy Report came out recently

Overall, I think it’s quite compelling evidence it was a suicide.

The points against it being suicide are:

The somewhat awkward angle of the gun
The lack of gunpowder on the skin surrounding the wound
The lack of stippling around the wound
Some small potential inconsistencies between parent's account and police report (such as the claim by parents that father talked to Suchir by phone on Friday)
Had a project to live for as a whistleblower against OpenAI
Suchir’s family insists he wasn’t suicidal; he seemed overall to be living a good life

The points for it are:

No signs of forced entry or struggle
The gun was purchased by and registered to Suchir
There was gunpowder residue on Suchir's hands and Suchir's DNA was on the gun
The door to his apartment was locked
Internet searches on his computer related to “total gray matter volume” and “white matter” in brain
The toxicology report shows he was drunk
The base-rate of suicide is much higher than homicide
Theories about who would have a motivation to kill him seem far-fetched to me; OpenAI is getting much worse press form the suicide than they likely would have if he hadn’t killed himself

General notes summarizing the report:

Suchir was found by the police on Tuesday, 11/26/24, at his apartment at 188 Buchanan Street Apartment #409; as part of a wellness check (requested by his mother)
Apartment was locked with a deadbolt, no signs of forced entry to the apartment or signs of struggle within the apartment; desktop computer with searchers for "total gray matter volume" and "white matter"; key for apartment was found in unit
Last communicated with his mother via text on Friday, 11/22/2024 (the parents in interviews have claimed that their last contact with him was when he talked with his father by phone on Friday night but the police report doesn’t mention this)
Death was from gunshot wound to the head; entry point at the glabella of the forehead (between the eyes), centered approximately 13 cm below the top of the head and 0.5 cm right of the anterior midline
Bullet traveled down and back and into the neck, where it was recovered; soot and unburned gunpowder was not readily visible on the skin, no gunpowder stippling was found around the wound
Glock pistol found at scene; 4 live rounds, 1 spent casing; pistol was purchased and registered by Suchir on 01/04/2024; gunpowder residue was found on both of Suchir’s hands; Suchir's DNA was found on the pistol; markings on the bullet recovered during examination matched unique striations left by the pistol's barrel
Toxicology report showed .178 BAC level (very intoxicated); blood Amphetamine levels between 35 ng/mL and 39 ng/mL (not very notable); GHB levels between 54,000 ng/mL and 67,000 ng/mL (apparently this is within normal ranges or how much buildup of GHB you might naturally expect given the body had been decaying for a few days)
Decomposition of the body was moderate when found.