LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Social status part 2/2: everything else
Steven Byrnes (steve2152) · 2024-03-05T16:29:19.072Z · comments (2)

[link] DeepMind: Evaluating Frontier Models for Dangerous Capabilities
Zach Stein-Perlman · 2024-03-21T03:00:31.599Z · comments (8)

Managing catastrophic misuse without robust AIs
ryan_greenblatt · 2024-01-16T17:27:31.112Z · comments (17)

[link] Dario Amodei — Machines of Loving Grace
Matrice Jacobine · 2024-10-11T21:43:31.448Z · comments (26)

A civilization ran by amateurs
Olli Järviniemi (jarviniemi) · 2024-05-30T17:57:32.601Z · comments (7)

[link] electric turbofans
bhauth · 2024-11-02T22:50:59.807Z · comments (2)

MATS Alumni Impact Analysis
utilistrutil · 2024-09-30T02:35:57.273Z · comments (7)

Balsa Update and General Thank You
Zvi · 2023-12-12T20:30:03.980Z · comments (8)

[link] Research Report: Sparse Autoencoders find only 9/180 board state features in OthelloGPT
Robert_AIZI · 2024-03-05T13:55:33.483Z · comments (24)

What is "True Love"?
johnswentworth · 2024-08-18T16:05:47.358Z · comments (9)

Pollsters Should Publish Question Translations
jefftk (jkaufman) · 2024-09-08T22:10:04.932Z · comments (3)

Showing SAE Latents Are Not Atomic Using Meta-SAEs
Bart Bussmann (Stuckwork) · 2024-08-24T00:56:46.048Z · comments (9)

[link] Results from an Adversarial Collaboration on AI Risk (FRI)
Josh Rosenberg (josh-rosenberg) · 2024-03-11T20:00:24.642Z · comments (3)

[link] More people getting into AI safety should do a PhD
AdamGleave · 2024-03-14T22:14:48.855Z · comments (24)

Interdictor Ship
lsusr · 2024-08-19T04:59:18.487Z · comments (9)

Approaching Human-Level Forecasting with Language Models
Fred Zhang (fred-zhang) · 2024-02-29T22:36:34.012Z · comments (6)

[link] Linkpost: Memorandum on Advancing the United States’ Leadership in Artificial Intelligence
Nisan · 2024-10-25T04:37:00.828Z · comments (2)

Base LLMs refuse too
Connor Kissane (ckkissane) · 2024-09-29T16:04:21.343Z · comments (20)

[question] What do we know about the AI knowledge and views, especially about existential risk, of the new OpenAI board members?
Zvi · 2024-03-11T14:55:05.128Z · answers+comments (2)

Against empathy-by-default
Steven Byrnes (steve2152) · 2024-10-16T16:38:49.926Z · comments (24)

Toward Safety Cases For AI Scheming
Mikita Balesni (mykyta-baliesnyi) · 2024-10-31T17:20:06.019Z · comments (1)

5 Physics Problems
DaemonicSigil · 2024-03-18T08:05:45.971Z · comments (0)

0th Person and 1st Person Logic
Adele Lopez (adele-lopez-1) · 2024-03-10T00:56:14.446Z · comments (28)

AI Alignment via Slow Substrates: Early Empirical Results With StarCraft II
Lester Leong (lester-leong) · 2024-10-14T04:05:05.096Z · comments (9)

There Should Be More Alignment-Driven Startups
Vaniver · 2024-05-31T02:05:06.799Z · comments (14)

Originality vs. Correctness
alkjash · 2023-12-06T18:51:49.531Z · comments (17)

[link] Is Claude a mystic?
jessicata (jessica.liu.taylor) · 2024-06-07T04:27:09.118Z · comments (23)

Self-explaining SAE features
Dmitrii Kharlapenko (dmitrii-kharlapenko) · 2024-08-05T22:20:36.041Z · comments (13)

[link] How do open AI models affect incentive to race?
jessicata (jessica.liu.taylor) · 2024-05-07T00:33:20.658Z · comments (13)

"Epistemic range of motion" and LessWrong moderation
habryka (habryka4) · 2023-11-27T21:58:40.834Z · comments (3)

An Actually Intuitive Explanation of the Oberth Effect
Isaac King (KingSupernova) · 2024-01-10T20:23:17.216Z · comments (33)

Thoughts on SB-1047
ryan_greenblatt · 2024-05-29T23:26:14.392Z · comments (1)

[link] shoes with springs
bhauth · 2023-12-30T21:46:55.319Z · comments (6)

LessOnline Festival Updates Thread
Ben Pace (Benito) · 2024-04-18T21:55:08.003Z · comments (26)

Some negative steganography results
Fabien Roger (Fabien) · 2023-12-09T20:22:52.323Z · comments (5)

Rationalists are missing a core piece for agent-like structure (energy vs information overload)
tailcalled · 2024-08-17T09:57:19.370Z · comments (9)

What's next for the field of Agent Foundations?
Nora_Ammann · 2023-11-30T17:55:13.982Z · comments (23)

Feature Targeted LLC Estimation Distinguishes SAE Features from Random Directions
Lidor Banuel Dabbah · 2024-07-19T20:32:15.095Z · comments (6)

Understanding SAE Features with the Logit Lens
Joseph Bloom (Jbloom) · 2024-03-11T00:16:57.429Z · comments (0)

[link] Pacing Outside the Box: RNNs Learn to Plan in Sokoban
Adrià Garriga-alonso (rhaps0dy) · 2024-07-25T22:00:55.398Z · comments (8)

AI #48: Exponentials in Geometry
Zvi · 2024-01-18T14:20:07.869Z · comments (9)

D&D.Sci: The Mad Tyrant's Pet Turtles
abstractapplic · 2024-03-29T16:22:13.732Z · comments (18)

The Sense Of Physical Necessity: A Naturalism Demo (Introduction)
LoganStrohl (BrienneYudkowsky) · 2024-02-24T02:56:31.458Z · comments (1)

Does AI risk “other” the AIs?
Joe Carlsmith (joekc) · 2024-01-09T17:51:47.020Z · comments (3)

Why our politicians aren't Median
Yair Halberstadt (yair-halberstadt) · 2024-11-03T14:03:33.779Z · comments (15)

[link] Towards shutdownable agents via stochastic choice
EJT (ElliottThornley) · 2024-07-08T10:14:24.452Z · comments (11)

[link] Are There Examples of Overhang for Other Technologies?
Jeffrey Heninger (jeffrey-heninger) · 2023-12-13T21:48:08.954Z · comments (50)

AI #81: Alpha Proteo
Zvi · 2024-09-12T13:00:07.958Z · comments (3)

Measuring Coherence of Policies in Toy Environments
dx26 (dylan-xu) · 2024-03-18T17:59:08.118Z · comments (9)

[link] Linkpost: Surely you can be serious
kave · 2024-07-18T22:18:09.271Z · comments (8)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

viliam on Alignment is not intelligent

LessWrong has a nice font, and the screenshots are a bit difficult to read. You could have copied the text.

(I am not really interested in debating Claude, btw.)

viliam on James Camacho's Shortform

For utilitarianism, you need to choose a utility function. This is entirely based on your preferences: what you value, and who you value get weighed and summed to create your utility function. I don't see how this differs from selfish egoism: you decide what and who you value, and take actions that maximize these values.

I see a difference in the word "summed". In practice this would probably mean things like cooperating in the Prisoner's Dilemma (maximizing the sum of utility, rather than the utility of an individual player).

mikhail-samin on "The Solomonoff Prior is Malign" is a special case of a simpler argument

I’d bet 1:1 that, conditional on building a CEV-aligned AGI, we won’t consider this type of problem to have been among the top-5 hardest to solve.

Reality-fluid in our universe should pretty much add up to normality, to the extent it’s Tegmark IV (and it’d be somewhat weird for your assumed amount of compute and simulations to exist but not for all computations/maths objects to exist).

If a small fraction of computers simulating this branch stop, this doesn’t make you stop. All configurations of you are computed; simulators might slightly change the relative likelihood of currently being in one branch or another, but they can’t really terminate you

Furthermore, our physics seems very simple, and most places that compute us probably do it faithfully, on the level of the underlying physics, with no interventions.

I feel like thinking of reality-fluid as just inverse relationship to the description length might produce wrong intuitions. In Tegmark IV, you still get more reality-fluid if someone simulates you; and it’s less intuitive why this translates into shorter description length. It might be better to think of it as: if all computation/maths exists and I open my eyes in a random place, how often would that happen here? All the places run this world give some of their reality-fluid to this world. If a place visible from a bunch of other places starts to simulate this universe, it will be visible from slightly more places.

You can think of the entire object of everything, with all of its parts being simulated in countless other parts; or imagine a Markov process, but with worlds giving each other reality-fluid.

In that sense, the resource that we have is the reality-fluid of our future lightcone; it is our endowment, and we can use it to maximize the overall flourishing in the entire structure.

If we make decisions based on how good the overall/average use of the reality-fluid would be, you’ll gain less reality-fluid by manipulating our world the way described in the post than you’ll spend on the manipulation. It’s probably better for you to trade with us instead.

(I also feel like there might be a reasonable way to talk about causal descendants, where the probabilities are whatever abides the math of probability theory and causality down the nodes we care about, instead of being the likelihoods of opening eyes in different branches in a particular moment of evaluation.)

ape-in-the-coat on Antropical Probabilities Are Fully Explained by Difference in Possible Outcomes

It is an observation selection effect

It's just the simple fact that conditional probability of an event can be different from unconditional one.

Before you toss the coin you can reason only based on priors and therefore your credence is 1/2. But when a person hears "Hello", they've observed an event "I was selected from a large crowd" which happens twice as likely when the coin is Tails, therefore they can update on this information and get their credence in Tails up to 2/3.

This is exactly as surprising as the fact that after you tossed the coin and observed that it's Heads suddenly your credence in Heads is 100%, even though before the coin toss it was merely 50%.

mr-hire on Two flavors of computational functionalism

For me the answer is yes. There's some way of interpreting the colors of grains of sands on the beach as they swirl in the wind that would perfectly implement the miller robin primality test algorithm. So is the wind + sand computing the algorithm?

dragongod on DeepSeek beats o1-preview on math, ties on coding; will release weights

o1's reasoning trace also does this for different languages (IIRC I've seen Chinese and Japanese and other languages I don't recognise/recall), usually an entire paragraph not a word, but when I translated them it seemed to make sense in context.

steve2152 on Two flavors of computational functionalism

Do you think there are edge cases where I ask “Is such-and-such system running the Miller-Rabin primality test algorithm?”, and the answer is not a clear yes or no, but rather “Well, umm, kinda…”?

(Not rhetorical! I haven’t thought about it much.)

momom2 on Which things were you surprised to learn are not metaphors?

Top of the head like when I'm trying to frown too hard

jeremy-gillen on lemonhope's Shortform

would hopefully include many people who understand that understanding constraints is key and that past research understood some constraints.

Good point, I'm convinced by this.

build on past agent foundations research
I don't really agree with this. Why do you say this?

That's my guess at the level of engagement required to understand something. Maybe just because when I've tried to use or modify some research that I thought I understood, I always realise I didn't understand it deeply enough. I'm probably anchoring too hard on my own experience here, other people often learn faster than me.

(Also I'm confused about the discourse in this thread (which is fine), because I thought we were discussing "how / how much should grantmakers let the money flow".)

I was thinking "should grantmakers let the money flow to unknown young people who want a chance to prove themselves."

knight-lee on A better “Statement on AI Risk?”

It's true that risk alone isn't a good way to decide budgets. You're even more correct that convincing demands to spend money are something politicians learn to ignore out of necessity.

But while risk alone isn't a good way to decide budgets, you have to admit that lots of budget items have the purpose of addressing risk. For example, flood barriers address hurricane/typhoon rick. Structural upgrades address earthquake risk. Some preparations also address pandemic risk.

If you accept that some budget items are meant to address risk, shouldn't you also accept that the amount of spending should be somewhat proportional to the amount of risk? In that case, if the risk of NATO getting invaded is similar in amount to the rogue AGI risk, then the military spending to protect against invasion should be similar in amount to the spending to protect against rogue ASI.

I admit that politicians might not be rational enough to understand this, and there is a substantial probability this statement will fail. But it is still worth trying. The cost is a mere signature and the benefit may be avoiding a massive miscalculation.

Making this statement doesn't prevent others from making an even better statement. Many AI experts have signed multiple statements, e.g. the "Statement on AI Risk," and "Pause Giant AI Experiments." Some politicians and people are more convinced by one argument, while others are more convinced by another argument, so it helps to have different kinds of arguments backed by many signatories. Encouraging AI safety spending doesn't conflict with encouraging AI regulation. I think the competition between different arguments isn't actually that bad.