LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

The King and the Golem - The Animation
Writer · 2024-11-08T18:23:10.935Z · comments (0)

Estimating Tail Risk in Neural Networks
Mark Xu (mark-xu) · 2024-09-13T20:00:06.921Z · comments (9)

Personal AI Planning
jefftk (jkaufman) · 2024-11-10T14:00:06.837Z · comments (10)

[link] New o1-like model (QwQ) beats Claude 3.5 Sonnet with only 32B parameters
Jesse Hoogland (jhoogland) · 2024-11-27T22:06:12.914Z · comments (3)

o1-preview is pretty good at doing ML on an unknown dataset
Håvard Tveit Ihle (havard-tveit-ihle) · 2024-09-20T08:39:49.927Z · comments (1)

EIS XIV: Is mechanistic interpretability about to be practically useful?
scasper · 2024-10-11T22:13:51.033Z · comments (4)

[link] On Shifgrethor
JustisMills · 2024-10-27T15:30:13.688Z · comments (18)

The Third Fundamental Question
Screwtape · 2024-11-15T04:01:33.770Z · comments (7)

Occupational Licensing Roundup #1
Zvi · 2024-10-30T11:00:04.516Z · comments (11)

[link] An Opinionated Evals Reading List
Marius Hobbhahn (marius-hobbhahn) · 2024-10-15T14:38:58.778Z · comments (0)

Schelling game evaluations for AI control
Olli Järviniemi (jarviniemi) · 2024-10-08T12:01:24.389Z · comments (5)

AI Craftsmanship
abramdemski · 2024-11-11T22:17:01.112Z · comments (7)

Book Review: On the Edge: The Fundamentals
Zvi · 2024-09-23T13:40:11.058Z · comments (3)

[Intuitive self-models] 8. Rooting Out Free Will Intuitions
Steven Byrnes (steve2152) · 2024-11-04T18:16:26.736Z · comments (16)

Perils of Generalizing from One's Social Group
localdeity · 2024-11-24T15:31:18.332Z · comments (1)

Another argument against maximizer-centric alignment paradigms
Fiora from Rosebloom · 2024-09-22T07:28:27.856Z · comments (39)

[link] AI, centralization, and the One Ring
owencb · 2024-09-13T14:00:16.126Z · comments (11)

[question] Is cybercrime really costing trillions per year?
Fabien Roger (Fabien) · 2024-09-27T08:44:07.621Z · answers+comments (28)

AI research assistants competition 2024Q3: Tie between Elicit and You.com
Elizabeth (pktechgirl) · 2024-10-12T15:10:05.417Z · comments (4)

[Intuitive self-models] 6. Awakening / Enlightenment / PNSE
Steven Byrnes (steve2152) · 2024-10-22T13:23:08.836Z · comments (6)

[link] Pay-on-results personal growth: first success
Chipmonk · 2024-09-14T03:39:12.975Z · comments (5)

SAEs are highly dataset dependent: a case study on the refusal direction
Connor Kissane (ckkissane) · 2024-11-07T05:22:18.807Z · comments (4)

[link] Electrostatic Airships?
DaemonicSigil · 2024-10-27T04:32:34.852Z · comments (13)

[link] on bacteria, on teeth
bhauth · 2024-09-30T15:56:56.830Z · comments (9)

Book Review: On the Edge: The Future
Zvi · 2024-09-27T14:00:05.279Z · comments (1)

[link] Slightly More Than You Wanted To Know: Pregnancy Length Effects
JustisMills · 2024-10-21T01:26:02.030Z · comments (4)

[link] Drexler's Nanotech Software
PeterMcCluskey · 2024-12-02T04:55:20.432Z · comments (6)

MATS Alumni Impact Analysis
utilistrutil · 2024-09-30T02:35:57.273Z · comments (7)

What is SB 1047 *for*?
Raemon · 2024-09-05T17:39:39.871Z · comments (8)

Training AI agents to solve hard problems could lead to Scheming
Marius Hobbhahn (marius-hobbhahn) · 2024-11-19T00:10:55.522Z · comments (12)

[link] electric turbofans
bhauth · 2024-11-02T22:50:59.807Z · comments (2)

Why imperfect adversarial robustness doesn't doom AI control
Buck · 2024-11-18T16:05:06.763Z · comments (27)

A case for donating to AI risk reduction (including if you work in AI)
tlevin (trevor) · 2024-12-02T19:05:06.658Z · comments (2)

Base LLMs refuse too
Connor Kissane (ckkissane) · 2024-09-29T16:04:21.343Z · comments (20)

Toward Safety Cases For AI Scheming
Mikita Balesni (mykyta-baliesnyi) · 2024-10-31T17:20:06.019Z · comments (1)

[link] Linkpost: Memorandum on Advancing the United States’ Leadership in Artificial Intelligence
Nisan · 2024-10-25T04:37:00.828Z · comments (2)

Pollsters Should Publish Question Translations
jefftk (jkaufman) · 2024-09-08T22:10:04.932Z · comments (3)

Why our politicians aren't Median
Yair Halberstadt (yair-halberstadt) · 2024-11-03T14:03:33.779Z · comments (15)

Against empathy-by-default
Steven Byrnes (steve2152) · 2024-10-16T16:38:49.926Z · comments (24)

[link] Dario Amodei — Machines of Loving Grace
Matrice Jacobine · 2024-10-11T21:43:31.448Z · comments (26)

AI Alignment via Slow Substrates: Early Empirical Results With StarCraft II
Lester Leong (lester-leong) · 2024-10-14T04:05:05.096Z · comments (9)

AI #81: Alpha Proteo
Zvi · 2024-09-12T13:00:07.958Z · comments (3)

AI #86: Just Think of the Potential
Zvi · 2024-10-17T15:10:06.552Z · comments (8)

How you can help pass important AI legislation with 10 minutes of effort
ThomasW · 2024-09-14T22:10:50.386Z · comments (2)

Mira Murati leaves OpenAI/ OpenAI to remove non-profit control
Sodium · 2024-09-25T21:15:17.315Z · comments (4)

The Geometry of Feelings and Nonsense in Large Language Models
7vik (satvik-golechha) · 2024-09-27T17:49:27.420Z · comments (10)

Brief analysis of OP Technical AI Safety Funding
22tom (thomas-barnes) · 2024-10-25T19:37:41.674Z · comments (4)

[Intuitive self-models] 5. Dissociative Identity (Multiple Personality) Disorder
Steven Byrnes (steve2152) · 2024-10-15T13:31:46.157Z · comments (7)

[link] The Alignment Trap: AI Safety as Path to Power
crispweed · 2024-10-29T15:21:26.545Z · comments (17)

[question] Could orcas be (trained to be) smarter than humans? 
Towards_Keeperhood (Simon Skade) · 2024-11-04T23:29:26.677Z · answers+comments (20)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

abstractapplic on I Finally Worked Through Bayes' Theorem (Personal Achievement)

Congrats on applying Bayes; unfortunately, you applied it to the wrong numbers.

The key point is that "Question 3: Bayes" is describing a new village, with demographics slightly different to the village in the first half of your post. You grandfathered in the 0.2 from there, when the equivalent number in Village Two is 0.16 (P(Cat) = P(Witch with Cat) + P(Muggle with Cat) = 0.1*0.7 + 0.9*0.1 = 0.07 + 0.09 = 0.16), for a final answer of 43.75%.

(The meta-lesson here is not to trust LLMs to give you info you can't personally verify, and especially not to trust them to check anything.)

ETA: Also, good on you for posting this. I think LW needs more numbery posts, more 101-level posts, and more falsifiable posts; a numbery 101-level falsifiable post gets a gold star (even if it ends up falsified).

martin-randall on Evaluating the historical value misspecification argument

This is good news because this is more in line with my original understanding of your post. It's difficult because there are multiple closely related problems of varying degrees of lethality and we had updates on many of them between 2007 and 2023. I'm going to try to put the specific update you are pointing at into my own words.

From the perspective of 2007, we don't know if we can lossilly extracting human values into a convenient format using human intelligence and safe tools. We know that a superintelligence can do it (assuming that "human values" is meaningful), but we also know that if we try to do this with an unaligned superintelligence then we all die.

If this problem is unsolvable then we potentially have to create a seed AI using some more accessible value, such as corrigibility, and try to maintain that corrigibility as we ramp up intelligence. This then leads us to the problem of specifying corrigibility, and we see "Corrigibility is anti-natural to consequentialist reasoning" on List of Lethalities [LW · GW].

If this problem is solvable then we can use human values sooner and this gives us other options. Maybe we can find a basin of attraction around human values [LW · GW] for example.

The update between 2007 and 2023 is that the problem appears solvable. GPT-4 is a safe tool (it exists and we aren't extinct yet) and does a decent job. A more focused AI could do the task better without being riskier.

This does not mean that we are not going to die. Yudkowsky has 43 items on List of Lethalities. This post addresses part of item 24. The remaining items are sufficient to kill us ~42.5 times. It's important to be able to discuss one lethality at a time if we want to die with dignity.

ari_zerner on I Finally Worked Through Bayes' Theorem (Personal Achievement)

Congrats!

keltan on keltan's Shortform

Hu. That is extremely useful. Thank you.

I've got a lot of singing out of AVM. While my current method works well for this, I find it more challenging than eliciting harmful outputs.

jblack on Lorec's Shortform

It's an arbitrary convention. We could have equally well chosen a convention in which a left hand rule was valid. (Really a whole bunch of such conventions)
In the Newtonian 2-point model gravity is a purely radial force and so conserves angular momentum, which means that velocity remains in one plane. If the bodies are extended objects, then you can get things like spin-orbit coupling which can lead to orbits not being perfectly planar if the rotation axes aren't aligned with the initial angular momentum axis.
If there are multiple bodies then trajectories can be and usually will be at least somewhat non-planar, though energy losses without corresponding angular momentum losses can drive a system toward a more planar state.
Zero dimensions would only be possible if both the net force and initial velocity were zero, which can't happen if gravity is the only applicable force and there are two distinct points.
In general relativity gravity isn't really a force and isn't always radial, and orbits need not always be planar and usually aren't closed curves anyway. Though again, many systems will tend to approach a more planar state.

lsusr on How can I convince my cryptobro friend that S&P500 is efficient?

The existence of people like your friend are why the market looks efficient to people like you.

lsusr on Open Thread Fall 2024

No idea. My favorite stuff is cryptic and self-referential, and I think IQ is a reasonable metric for assessing intelligence statistically, for a group of people [LW · GW].

michaeldickens on Analysis of Global AI Governance Strategies

Also, I don't feel that this article adequately addressed the downside of SA that it accelerates an arms race. SA is only favored when alignment is easy with high probability and you're confident that you will win the arms race, and you're confident that it's better for you to win than for the other guy[1], and you're talking about a specific kind of alignment where an "aligned" AI doesn't necessarily behave ethically, it just does what its creator intends.

[1] How likely is a US-controlled (or, more accurately, Sam Altman/Dario Amodei/Mark Zuckerberg-controlled) AGI to usher in a global utopia? How likely is a China-controlled AGI to do the same? I think people are too quick to take it for granted that the former probability is larger than the latter.

self on Resist the Happy Death Spiral

Splitting the Great Idea into parts

Applied to "The Sequences", or Rationality:

a collection of good predictive models
a foundation for a culture more productive and virtuous than mainstream culture

Treating every additional detail as burdensome

It helps to apply scepticism to every post, and internally rank posts by usefulness and credence.

michaeldickens on Analysis of Global AI Governance Strategies

Cooperative Development (CD) is favored when alignment is easy and timelines are longer. [...]

Strategic Advantage (SA) is more favored when alignment is easy but timelines are short (under 5 years)

I somewhat disagree with this. CD is favored when alignment is easy with extremely high probability. A moratorium is better given even a modest probability that alignment is hard, because the downside to misalignment is so much larger than the downside to a moratorium.[1] The same goes for SA—it's only favored when you are extremely confident about alignment + timelines.

[1] Unless you believe a moratorium has a reasonable probability of permanently preventing friendly AI from being developed.