LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

[Intuitive self-models] 8. Rooting Out Free Will Intuitions
Steven Byrnes (steve2152) · 2024-11-04T18:16:26.736Z · comments (16)

[link] DeepMind: Frontier Safety Framework
Zach Stein-Perlman · 2024-05-17T17:30:02.504Z · comments (0)

AI Craftsmanship
abramdemski · 2024-11-11T22:17:01.112Z · comments (7)

Bayesian updating in real life is mostly about understanding your hypotheses
Max H (Maxc) · 2024-01-01T00:10:30.978Z · comments (4)

On Llama-3 and Dwarkesh Patel’s Podcast with Zuckerberg
Zvi · 2024-04-22T13:10:02.645Z · comments (4)

All About Concave and Convex Agents
mako yass (MakoYass) · 2024-03-24T21:37:17.922Z · comments (23)

Generalization, from thermodynamics to statistical physics
Jesse Hoogland (jhoogland) · 2023-11-30T21:28:50.089Z · comments (9)

What mistakes has the AI safety movement made?
EuanMcLean (euanmclean) · 2024-05-23T11:19:02.717Z · comments (29)

Another argument against maximizer-centric alignment paradigms
Fiora from Rosebloom · 2024-09-22T07:28:27.856Z · comments (39)

[link] Moving on from community living
Vika · 2024-04-17T17:02:11.357Z · comments (7)

Against most, but not all, AI risk analogies
Matthew Barnett (matthew-barnett) · 2024-01-14T03:36:16.267Z · comments (41)

Self-Awareness: Taxonomy and eval suite proposal
Daniel Kokotajlo (daniel-kokotajlo) · 2024-02-17T01:47:01.802Z · comments (2)

[link] A primer on why computational predictive toxicology is hard
Abhishaike Mahajan (abhishaike-mahajan) · 2024-08-19T17:16:37.735Z · comments (2)

[link] AI, centralization, and the One Ring
owencb · 2024-09-13T14:00:16.126Z · comments (11)

AiPhone
Zvi · 2024-06-12T22:20:02.141Z · comments (4)

[link] Improving Dictionary Learning with Gated Sparse Autoencoders
Senthooran Rajamanoharan (SenR) · 2024-04-25T18:43:47.003Z · comments (38)

AI research assistants competition 2024Q3: Tie between Elicit and You.com
Elizabeth (pktechgirl) · 2024-10-12T15:10:05.417Z · comments (2)

[question] Is cybercrime really costing trillions per year?
Fabien Roger (Fabien) · 2024-09-27T08:44:07.621Z · answers+comments (28)

Black Box Biology
GeneSmith · 2023-11-29T02:27:29.794Z · comments (30)

[link] Slightly More Than You Wanted To Know: Pregnancy Length Effects
JustisMills · 2024-10-21T01:26:02.030Z · comments (4)

Do not delete your misaligned AGI.
mako yass (MakoYass) · 2024-03-24T21:37:07.724Z · comments (13)

E.T. Jaynes Probability Theory: The logic of Science I
Jan Christian Refsgaard (jan-christian-refsgaard) · 2023-12-27T23:47:52.579Z · comments (20)

[link] Electrostatic Airships?
DaemonicSigil · 2024-10-27T04:32:34.852Z · comments (13)

SAEs are highly dataset dependent: a case study on the refusal direction
Connor Kissane (ckkissane) · 2024-11-07T05:22:18.807Z · comments (4)

[link] Outrage Bonding
Jonathan Moregård (JonathanMoregard) · 2024-08-09T13:46:59.818Z · comments (12)

A framework for thinking about AI power-seeking
Joe Carlsmith (joekc) · 2024-07-24T22:41:01.685Z · comments (15)

On coincidences and Bayesian reasoning, as applied to the origins of COVID-19
viking_math · 2024-02-19T01:14:06.772Z · comments (28)

Don't sleep on Coordination Takeoffs
trevor (TrevorWiesinger) · 2024-01-27T19:55:26.831Z · comments (24)

[link] on bacteria, on teeth
bhauth · 2024-09-30T15:56:56.830Z · comments (9)

What is a Tool?
johnswentworth · 2024-06-25T23:40:07.483Z · comments (4)

[link] Pay-on-results personal growth: first success
Chipmonk · 2024-09-14T03:39:12.975Z · comments (5)

Book Review: On the Edge: The Future
Zvi · 2024-09-27T14:00:05.279Z · comments (1)

[Intuitive self-models] 6. Awakening / Enlightenment / PNSE
Steven Byrnes (steve2152) · 2024-10-22T13:23:08.836Z · comments (6)

AI #55: Keep Clauding Along
Zvi · 2024-03-14T15:40:09.335Z · comments (16)

[link] Ice: The Penultimate Frontier
Roko · 2024-07-13T23:44:56.827Z · comments (56)

[link] Twitter thread on AI safety evals
Richard_Ngo (ricraz) · 2024-07-31T00:18:14.076Z · comments (3)

RTFB: California’s AB 3211
Zvi · 2024-07-30T13:10:03.853Z · comments (2)

Catastrophic Goodhart in RL with KL penalty
Thomas Kwa (thomas-kwa) · 2024-05-15T00:58:20.763Z · comments (10)

[link] Superforecasting the Origins of the Covid-19 Pandemic
DanielFilan · 2024-03-12T19:01:15.914Z · comments (0)

[link] Research Report: Sparse Autoencoders find only 9/180 board state features in OthelloGPT
Robert_AIZI · 2024-03-05T13:55:33.483Z · comments (24)

[link] Dario Amodei — Machines of Loving Grace
Matrice Jacobine · 2024-10-11T21:43:31.448Z · comments (26)

The proper response to mistakes that have harmed others?
Ruby · 2023-12-31T04:06:31.505Z · comments (12)

Why imperfect adversarial robustness doesn't doom AI control
Buck · 2024-11-18T16:05:06.763Z · comments (27)

Social status part 2/2: everything else
Steven Byrnes (steve2152) · 2024-03-05T16:29:19.072Z · comments (2)

Inspired by: Failures in Kindness
X4vier · 2024-07-27T01:21:42.848Z · comments (2)

AI Safety Chatbot
markov (markovial) · 2023-12-21T14:06:48.981Z · comments (11)

[link] electric turbofans
bhauth · 2024-11-02T22:50:59.807Z · comments (2)

Offering AI safety support calls for ML professionals
Vael Gates · 2024-02-15T23:48:12.797Z · comments (1)

Natural Latents Are Not Robust To Tiny Mixtures
johnswentworth · 2024-06-07T18:53:36.643Z · comments (8)

MATS Alumni Impact Analysis
utilistrutil · 2024-09-30T02:35:57.273Z · comments (7)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

odd-anon on A better “Statement on AI Risk?”

I don't think this would be a good letter. The military comparison is unhelpful; risk alone isn't a good way to decide budgets. Yet, half the statement is talking about the military. Additionally, call-to-action statements that involve "Spend money on this! If you don't, it'll be catastrophic!" are something that politicians hear on a constant basis, and they ignore most of them out of necessity.

In my opinion, a better statement would be something like: "Apocalyptic AI is being developed. This should be stopped, as soon as possible."

sil-ver on Two flavors of computational functionalism

I don't know if you are going to address this, but if I were to write a sequence of posts on functionalism, I'd start with the problem that "computation" isn't very well defined, and hence functionalism isn't very well-defined, either. In practice it's often clear enough whether or not a system is computing something, but you're going to have a hard time giving a fully general, rigorous, universally applicable definition of what exactly a physical process has to do to count as computing something (and if so, what precisely it is computing). Similarly, your definition of the Practical CF inherits this problem because it's not at all clear what "capturing the dynamics of the brain on some coarse-grained level of abstraction" means. This problem is usually brushed over but imo that's where all the difficulty lies.

(Of course, many people think consciousness is inherently fuzzy, in which case associating it with similarly fuzzy concepts isn't a problem. But I'm assuming you're taking a realist point of view here and assume consciousness is well-defined, since otherwise there's not much of a question to answer. If consciousness is just an abstraction, functionalism becomes vacuously true as a descriptive statement.)

yonatan-cale-1 on Yonatan Cale's Shortform

:)

If you want to try it meanwhile, check out https://github.com/MineDojo/Voyager

avturchin on Magic by forgetting

It will work only if I care for my observations, something like EDT.

christiankl on (Salt) Water Gargling as an Antiviral

I assume salt water has lower side effects, so that seemed like a promising thing to check.

Why do you make that assumption? Besides the antiviral effect of it, I would expect salt water to drain H_2O from the oral mucosa. Do you think the effect is too small to matter? Do you think it's a desirable effect?

knight-lee on A better “Statement on AI Risk?”

This is an important point. AI alignment/safety organizations take money as input and write very abstract papers as their output, which usually have no immediate applications. I agree it may appear very unproductive.

However, if we think from first principles, a lot of other things are like that. For instance, when you go to school, you study the works of Shakespeare, you learn to play the guitar, and you learn how Spanish pronouns work. These things appear to be a complete waste of time. If 50 million students in the US spend 1 hour a day on these kinds of activities, and each hour is valued at only $10, that's $180 billion/year.

But we know these things are not a waste of time, because in hindsight, when you study how students grow up, this work somehow helps them later in life.

Lots of things appear useless, but are valuable for reasons beyond the intuitive set of reasons we evolved to understand.

Studying the nucleus of atoms might appear like a useless curiosity, if you didn't know it'll lead to nuclear energy. There are no real world applications for a long time but suddenly there are enormous applications.

Pasteur's studies on fermentation might appear limited to modest winemaking improvements, but it led to the discovery of germ theory which saved countless lives.

The stone age people studying weird rocks may have discovered obsidian and copper. Those who studied the strange seeds that plants produce may have discovered agriculture.

We don't know how valuable this alignment work is. We should cope with this uncertainty probabilistically: if there is a 50% chance it will help us, the benefits per cost is halved, but that doesn't reduce ideal spending to zero.

dr_s on Cost, Not Sacrifice

I think it's a very visible example that right now is particularly often brought up. I'm not saying it's all there is to it but I think the fundamental visceral reaction to the very idea of self-mutilation is an important and often overlooked element of why some people would be put off by the concept. I actually think it's something that makes the whole thing a lot more understandable in what it comes from than the generic "well they're just bigoted and evil" stuff people come up with in extremely partisan arguments on the topics. These sort of psychological processes - the fact that we may first have a gut-level reaction, and only later rationalize it by constructing an ideological framework to justify why the things that repulses us are evil - are very well documented, and happen all over the place. Does not mean everyone who disagrees with me does so because of it (nor that everyone who agrees doesn't do it!) but it would be foolish to just pretend this never happens because it sounds a bit offensive to bring up in a debate. The entire concept of rationality is based around the awareness that yeah, we're constantly affected by cognitive biases like these, and separating the wheat from the chaff is hard work.

And by the way it's an excellent example of the reverse too. Just like people who are not dysphoric are put off by mutilation, people who are are put off by the feeling of having something grafted onto their bodies that doesn't belong. Which is sort of the flip side of it. Essentially we tend to have a mental image of our bodies and a strong aversion to that shape being altered or disturbed in some way (which makes all kinds of sense evolutionarily, really). Ironically enough, it's probably via the mechanism of empathy that someone can see someone else do something to their body that feels "wrong" and cringe/be grossed out on their behalf (if you think trans issues are controversial, consider the reactions some people can have even to things like piercings in particularly sensitive places).

charlie-steiner on Are You More Real If You're Really Forgetful?

Fair enough.

Yes, it seems totally reasonable for bounded reasoners to consider hypotheses (where a hypothesis like 'the universe is as it would be from the perspective of prisoner #3' functions like treating prisoner #3 as 'an instance of me') that would be counterfactual or even counterlogical for more idealized reasoners.

Typical bounded reasoning weirdness is stuff like seeming to take some counterlogicals (e.g. different hypotheses about the trillionth digit of pi) seriously despite denying 1+1=3, even though there's a chain of logic connecting one to the other. Projecting this into anthropics, you might have a certain systematic bias about which hypotheses you can consider, and yet deny that that systematic bias is valid when presented with it abstractly.

This seems like it makes drawing general lessons about what counts as 'an instance of me' from the fact that I'm a bounded reasoner pretty fraught.

j-bostock on Yonatan Cale's Shortform

I volunteer to play Minecraft with the LLM agents. I think this might be one eval where the human evaluators are easy to come by.

mitchell_porter on Why We Wouldn't Build Aligned AI Even If We Could

For my part, I have been wondering this week, what a constructive reply to this would be.

I think your proposed imperatives and experiments are quite good. I hope that they are noticed and thought about. I don't think they are sufficient for correctly aligning a superintelligence, but they can be part of the process that gets us there.

That's probably the most important thing for me to say. Anything else is just a disagreement about the nature of the world as it is now, and isn't as important.