LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

AI research assistants competition 2024Q3: Tie between Elicit and You.com
Elizabeth (pktechgirl) · 2024-10-12T15:10:05.417Z · comments (2)

[link] Improving Dictionary Learning with Gated Sparse Autoencoders
Senthooran Rajamanoharan (SenR) · 2024-04-25T18:43:47.003Z · comments (38)

Bayesian updating in real life is mostly about understanding your hypotheses
Max H (Maxc) · 2024-01-01T00:10:30.978Z · comments (4)

Generalization, from thermodynamics to statistical physics
Jesse Hoogland (jhoogland) · 2023-11-30T21:28:50.089Z · comments (9)

[link] AI, centralization, and the One Ring
owencb · 2024-09-13T14:00:16.126Z · comments (11)

[Intuitive self-models] 4. Trance
Steven Byrnes (steve2152) · 2024-10-08T13:30:41.446Z · comments (6)

[question] Is cybercrime really costing trillions per year?
Fabien Roger (Fabien) · 2024-09-27T08:44:07.621Z · answers+comments (28)

What mistakes has the AI safety movement made?
EuanMcLean (euanmclean) · 2024-05-23T11:19:02.717Z · comments (29)

All About Concave and Convex Agents
mako yass (MakoYass) · 2024-03-24T21:37:17.922Z · comments (23)

AiPhone
Zvi · 2024-06-12T22:20:02.141Z · comments (4)

Another argument against maximizer-centric alignment paradigms
Fiora from Rosebloom · 2024-09-22T07:28:27.856Z · comments (39)

On Llama-3 and Dwarkesh Patel’s Podcast with Zuckerberg
Zvi · 2024-04-22T13:10:02.645Z · comments (4)

Against most, but not all, AI risk analogies
Matthew Barnett (matthew-barnett) · 2024-01-14T03:36:16.267Z · comments (41)

Self-Awareness: Taxonomy and eval suite proposal
Daniel Kokotajlo (daniel-kokotajlo) · 2024-02-17T01:47:01.802Z · comments (2)

Eugenics Performed By A Blind, Idiot God
omnizoid · 2023-09-17T20:37:13.650Z · comments (10)

[link] Moving on from community living
Vika · 2024-04-17T17:02:11.357Z · comments (7)

[link] Outrage Bonding
Jonathan Moregård (JonathanMoregard) · 2024-08-09T13:46:59.818Z · comments (12)

Don't sleep on Coordination Takeoffs
trevor (TrevorWiesinger) · 2024-01-27T19:55:26.831Z · comments (24)

[link] Twitter thread on AI safety evals
Richard_Ngo (ricraz) · 2024-07-31T00:18:14.076Z · comments (3)

AI #55: Keep Clauding Along
Zvi · 2024-03-14T15:40:09.335Z · comments (16)

RTFB: California’s AB 3211
Zvi · 2024-07-30T13:10:03.853Z · comments (2)

What is a Tool?
johnswentworth · 2024-06-25T23:40:07.483Z · comments (4)

Catastrophic Goodhart in RL with KL penalty
Thomas Kwa (thomas-kwa) · 2024-05-15T00:58:20.763Z · comments (10)

E.T. Jaynes Probability Theory: The logic of Science I
Jan Christian Refsgaard (jan-christian-refsgaard) · 2023-12-27T23:47:52.579Z · comments (20)

On coincidences and Bayesian reasoning, as applied to the origins of COVID-19
viking_math · 2024-02-19T01:14:06.772Z · comments (28)

[link] Slightly More Than You Wanted To Know: Pregnancy Length Effects
JustisMills · 2024-10-21T01:26:02.030Z · comments (4)

Taxonomy of AI-risk counterarguments
Odd anon · 2023-10-16T00:12:51.021Z · comments (13)

SAEs are highly dataset dependent: a case study on the refusal direction
Connor Kissane (ckkissane) · 2024-11-07T05:22:18.807Z · comments (4)

Black Box Biology
GeneSmith · 2023-11-29T02:27:29.794Z · comments (30)

[link] Superforecasting the Origins of the Covid-19 Pandemic
DanielFilan · 2024-03-12T19:01:15.914Z · comments (0)

A framework for thinking about AI power-seeking
Joe Carlsmith (joekc) · 2024-07-24T22:41:01.685Z · comments (15)

Book Review: On the Edge: The Future
Zvi · 2024-09-27T14:00:05.279Z · comments (1)

Do not delete your misaligned AGI.
mako yass (MakoYass) · 2024-03-24T21:37:07.724Z · comments (13)

Never Drop A Ball
Screwtape · 2023-11-23T04:15:35.834Z · comments (1)

[link] Ice: The Penultimate Frontier
Roko · 2024-07-13T23:44:56.827Z · comments (56)

Thoughts on open source AI
Sam Marks (samuel-marks) · 2023-11-03T15:35:42.067Z · comments (17)

[link] Pay-on-results personal growth: first success
Chipmonk · 2024-09-14T03:39:12.975Z · comments (5)

[link] Electrostatic Airships?
DaemonicSigil · 2024-10-27T04:32:34.852Z · comments (13)

My hopes for alignment: Singular learning theory and whole brain emulation
Garrett Baker (D0TheMath) · 2023-10-25T18:31:14.407Z · comments (5)

What is SB 1047 *for*?
Raemon · 2024-09-05T17:39:39.871Z · comments (8)

[link] DeepMind: Evaluating Frontier Models for Dangerous Capabilities
Zach Stein-Perlman · 2024-03-21T03:00:31.599Z · comments (8)

Managing risks while trying to do good
Wei Dai (Wei_Dai) · 2024-02-01T18:08:46.506Z · comments (26)

Vote on worthwhile OpenAI topics to discuss
Ben Pace (Benito) · 2023-11-21T00:03:03.898Z · comments (55)

Balancing Games
jefftk (jkaufman) · 2024-02-24T14:40:04.237Z · comments (18)

[link] Dario Amodei — Machines of Loving Grace
Matrice Jacobine · 2024-10-11T21:43:31.448Z · comments (26)

Balsa Update and General Thank You
Zvi · 2023-12-12T20:30:03.980Z · comments (8)

MATS Alumni Impact Analysis
utilistrutil · 2024-09-30T02:35:57.273Z · comments (6)

Inspired by: Failures in Kindness
X4vier · 2024-07-27T01:21:42.848Z · comments (2)

Natural Latents Are Not Robust To Tiny Mixtures
johnswentworth · 2024-06-07T18:53:36.643Z · comments (8)

AI Safety Chatbot
markov (markovial) · 2023-12-21T14:06:48.981Z · comments (11)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

tahp on Thoughts after the Wolfram and Yudkowsky discussion

I don't think you're being creative enough about solving the problem cheaply, but I also don't think this particular detail is relevant to my main point. Now you've made me think more about the problem, here's me making a few more steps toward trying to resolve my confusion:

The idea with instrumental convergence is that smart things with goals predictably go hard with things like gathering resources and increasing odds of survival before the goal is complete which are relevant to any goal. As a directionally-correct example for why this could be lethal, humans are smart enough to do gain-of-function research on viruses and design algorithms that predict protein folding. I see no reason to think something smarter could not (with some in-lab experimentation) design a virus that kills all humans simultaneously at a predetermined time, and if you can do that without affecting any of your other goals more than you think humans might interfere your goals, then sure, you kill all the humans because it's easy and you might as well. You can imagine somehow making an AI that cares about humans enough not to straight up kill all of them, but if humans are a survival threat, we should expect it to find some other creative way to contain us, and this is not a design constraint you should feel good about.

In particular, if you are an algorithm which is willing to kill all humans, it is likely that humans do not want you to run, and so letting humans live is bad for your own survival if you somehow get made before the humans notice you are willing to kill them all. This is not a good sign for humans' odds of being able to get more than one try to get AI right if most things are concerned with their own survival, even if that concern is only implicit in having any goal whatsoever.

Importantly, none of this requires humans to make a coding error. It only requires a thing with goals and intelligence, and the only apparent way to get around it is to have the smart thing implicitly care about literally every thing that humans care about to the same relative degrees that humans care about them. It's not a formal proof, but maybe it's the beginning of one. Parenthetically, I guess it's also a good reason to have a lot of military capability before you go looking for aliens, even if you don't intend to harm any.

danielfilan on DanielFilan's Shortform Feed

When I wrote that, I wasn't thinking so much about evals / model organisms as stuff like:

putting a bunch of agents in a simulated world and seeing how they interact
weak-to-strong / easy-to-hard generalization

basically stuff along the lines of "when you put agents in X situation, they tend to do Y thing", rather than trying to understand latent causes / capabilities

startattheend on Anvil Problems

This probably makes more sense if you view it as a boolean type, you either "have an anvil" or you don't, and you either have access to fire or you don't. We view a lot of things as booleans (if your clothes get wet, then wet is a boolean). This might be helpful? It connects what might seem like a sort of edge case into something familiar.

But "something that relies on itself" and "something which is usually hard to get, but easy to get more of once you have a bit of it" are a bit more special I suppose. "Catalyst" is a sort of similar yet different idea. You could graph these concepts as dependency relations and try out all permutations to see if more types of problems exists

shankar-sivarajan on Flipping Out: The Cosmic Coinflip Thought Experiment Is Bad Philosophy

"Would you destroy a better world to save this one?"

From Ada Palmer's Terra Ignota, that might be an interesting reframing of this wager: you are destroying (the chance of) a better world, more than twice as good as this one, to guarantee the survival of this one.

habryka4 on An alternative way to browse LessWrong 2.0

Not sure what you mean. The API continues to exist (and has existed since the beginning of LW 2.0).

trevorone on AI Safety is Dropping the Ball on Clown Attacks

I'm not sure what to think about this, Thomas777's approach is generally a good one but for both of these examples, a shorter route (that it's cleanly mutually understood to be adding insult to injury as a flex by the aggressor) seems pretty probable. Free speech/censorship might be a better example as plenty of cultures are less aware of information theory and progress.

I don't know what proportion of the people in the US Natsec community understand 'rigged psychological games' well enough to occasionally read books on the topic, but the bar is pretty low for hopping onto fads as tricks only require one person to notice or invent them and then they can simply just get popular (with all kinds of people with varying capabilities/resources/technology and bandwidth/information/deffciencies hopping on the bandwagon).

brambleboy on Is Success the Enemy of Freedom? (Full)

It's been about 4 years. How do you feel about this now?

paul-kent on Belief in Belief

The recent conception of the hostile telepaths problem [LW · GW] goes a long way towards explaining why people believe in belief in the first place.

tomcatfish on Being the (Pareto) Best in the World

In regard to bullet 1, I would caution against relying on this. If you show up to many fields expecting to smash through it because you're smart, you'll be torn to bits in many many fields. This is because the fields that are useful are already being dominated by people who are good at things to the extent that they're economically or emotionally valuable.

The exact example of chess makes this clear. If a smart LWer thinks "Oh, I'll get to the chess leaderboards because I'm really smart", they are going to find out after some weeks of studying that… everyone else on the leaderboards is smart too!

dalcy on Dalcy's Shortform

The critical insight is that this is not always the case!

Let's call two graphs I-equivalent if their set of independencies (implied by d-separation) are identical. A theorem of Bayes Nets say that two graphs are I-equivalent if they have the same skeleton and the same set of immoralities.

This last constraint, plus the constraint that the graph must be acyclic, allows some arrow directions to be identified - namely, across all I-equivalent graphs that are the perfect map of a distribution, some of the edges have identical directions assigned to them.

The IC algorithm (Verma & Pearl, 1990) for finding perfect maps (hence temporal direction) is exactly about exploiting these conditions to orient as many of the edges as possible:

More intuitively, (Verma & Pearl, 1992) and (Meek, 1995) together shows that the following four rules are necessary and sufficient operations to maximally orient the graph according to the I-equivalence (+ acyclicity) constraint:

Anyone interested in further detail should consult Pearl's Causality Ch 2. Note that for some reason Ch 2 is the only chapter in the book where Pearl talks about Causal Discovery (i.e. inferring time from observational distribution) and the rest of the book is all about Causal Inference (i.e. inferring causal effect from (partially) known causal structure).