LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Growth and Form in a Toy Model of Superposition
Liam Carroll (liam-carroll) · 2023-11-08T11:08:04.359Z · comments (7)

[link] Detecting Genetically Engineered Viruses With Metagenomic Sequencing
jefftk (jkaufman) · 2024-06-27T14:01:34.868Z · comments (10)

How well do truth probes generalise?
mishajw · 2024-02-24T14:12:19.729Z · comments (11)

Teaching CS During Take-Off
andrew carle (andrew-carle) · 2024-05-14T22:45:39.447Z · comments (13)

[link] More Hyphenation
Arjun Panickssery (arjun-panickssery) · 2024-02-07T19:43:29.086Z · comments (19)

Dragon Agnosticism
jefftk (jkaufman) · 2024-08-01T17:00:06.434Z · comments (60)

[link] Debating with More Persuasive LLMs Leads to More Truthful Answers
Akbir Khan (akbir-khan) · 2024-02-07T21:28:10.694Z · comments (14)

I'm a bit skeptical of AlphaFold 3
Oleg Trott (oleg-trott) · 2024-06-25T00:04:41.274Z · comments (14)

Defining alignment research
Richard_Ngo (ricraz) · 2024-08-19T20:42:29.279Z · comments (21)

You’re Measuring Model Complexity Wrong
Jesse Hoogland (jhoogland) · 2023-10-11T11:46:12.466Z · comments (15)

We don't understand what happened with culture enough
Jan_Kulveit · 2023-10-09T09:54:20.096Z · comments (21)

[link] Executable philosophy as a failed totalizing meta-worldview
jessicata (jessica.liu.taylor) · 2024-09-04T22:50:18.294Z · comments (40)

OpenAI: Helen Toner Speaks
Zvi · 2024-05-30T21:10:02.938Z · comments (8)

The Aspiring Rationalist Congregation
maia · 2024-01-10T22:52:54.298Z · comments (23)

Apply to be a Safety Engineer at Lockheed Martin!
yanni kyriacos (yanni) · 2024-03-31T21:02:08.499Z · comments (3)

[link] Linkpost: Rishi Sunak's Speech on AI (26th October)
bideup · 2023-10-27T11:57:46.575Z · comments (8)

Solving adversarial attacks in computer vision as a baby version of general AI alignment
Stanislav Fort (stanislavfort) · 2024-08-29T17:17:47.136Z · comments (8)

[link] Dario Amodei’s prepared remarks from the UK AI Safety Summit, on Anthropic’s Responsible Scaling Policy
Zac Hatfield-Dodds (zac-hatfield-dodds) · 2023-11-01T18:10:31.110Z · comments (1)

Reflections on Less Online
Error · 2024-07-07T03:49:44.534Z · comments (15)

[link] Anxiety vs. Depression
Sable · 2024-03-17T00:15:08.255Z · comments (35)

Addressing Feature Suppression in SAEs
Benjamin Wright (Benw8888) · 2024-02-16T18:32:51.927Z · comments (3)

[link] Environmentalism in the United States Is Unusually Partisan
Jeffrey Heninger (jeffrey-heninger) · 2024-05-13T21:23:10.755Z · comments (26)

[link] The Puritans would one-box: evidential decision theory in the 17th century
Jacob G-W (g-w1) · 2023-10-14T20:23:24.346Z · comments (5)

[link] "AI Safety for Fleshy Humans" an AI Safety explainer by Nicky Case
habryka (habryka4) · 2024-05-03T18:10:12.478Z · comments (10)

[link] Hardshipification
Jonathan Moregård (JonathanMoregard) · 2024-05-28T20:02:29.709Z · comments (17)

[link] Nietzsche's Morality in Plain English
Arjun Panickssery (arjun-panickssery) · 2023-12-04T00:57:42.839Z · comments (13)

Natural Latents: The Concepts
johnswentworth · 2024-03-20T18:21:19.878Z · comments (18)

[link] A Universal Emergent Decomposition of Retrieval Tasks in Language Models
Alexandre Variengien (alexandre-variengien) · 2023-12-19T11:52:27.354Z · comments (3)

Some for-profit AI alignment org ideas
Eric Ho (eh42) · 2023-12-14T14:23:20.654Z · comments (19)

[link] [Paper] Stress-testing capability elicitation with password-locked models
Fabien Roger (Fabien) · 2024-06-04T14:52:50.204Z · comments (10)

[link] [Paper] AI Sandbagging: Language Models can Strategically Underperform on Evaluations
Teun van der Weij (teun-van-der-weij) · 2024-06-13T10:04:49.556Z · comments (10)

Scalable oversight as a quantitative rather than qualitative problem
Buck · 2024-07-06T17:42:41.325Z · comments (11)

Actually, Power Plants May Be an AI Training Bottleneck.
Lao Mein (derpherpize) · 2024-06-20T04:41:33.567Z · comments (13)

Rejecting Television
Declan Molony (declan-molony) · 2024-04-23T04:59:50.253Z · comments (10)

AI #51: Altman’s Ambition
Zvi · 2024-02-20T19:50:07.439Z · comments (5)

[link] What are you getting paid in?
Austin Chen (austin-chen) · 2024-07-17T19:23:04.219Z · comments (13)

MATS Winter 2023-24 Retrospective
utilistrutil · 2024-05-11T00:09:17.059Z · comments (28)

Retirement Accounts and Short Timelines
jefftk (jkaufman) · 2024-02-19T18:50:05.231Z · comments (35)

New roles on my team: come build Open Phil's technical AI safety program with me!
Ajeya Cotra (ajeya-cotra) · 2023-10-19T16:47:59.701Z · comments (6)

A Crisper Explanation of Simulacrum Levels
Thane Ruthenis · 2023-12-23T22:13:52.286Z · comments (13)

[link] The Real Fanfic Is The Friends We Made Along The Way
Eneasz · 2023-10-18T19:21:40.431Z · comments (0)

A simple case for extreme inner misalignment
Richard_Ngo (ricraz) · 2024-07-13T15:40:37.518Z · comments (41)

Sparse Autoencoders Work on Attention Layer Outputs
Connor Kissane (ckkissane) · 2024-01-16T00:26:14.767Z · comments (9)

Untrusted smart models and trusted dumb models
Buck · 2023-11-04T03:06:38.001Z · comments (12)

Update on the UK AI Taskforce & upcoming AI Safety Summit
Elliot_Mckernon (elliot) · 2023-10-11T11:37:42.436Z · comments (2)

Release: Optimal Weave (P1): A Prototype Cohabitive Game
mako yass (MakoYass) · 2024-08-17T14:08:18.947Z · comments (19)

Agent Boundaries Aren't Markov Blankets. [Unless they're non-causal; see comments.]
abramdemski · 2023-11-20T18:23:40.443Z · comments (11)

We might be missing some key feature of AI takeoff; it'll probably seem like "we could've seen this coming"
Lukas_Gloor · 2024-05-09T15:43:11.490Z · comments (36)

Muddling Along Is More Likely Than Dystopia
Jeffrey Heninger (jeffrey-heninger) · 2023-10-20T21:25:15.459Z · comments (10)

Live Theory Part 0: Taking Intelligence Seriously
Sahil · 2024-06-26T21:37:10.479Z · comments (3)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

steve2152 on [Intuitive self-models] 1. Preliminaries

Thanks for the kind words!

The thing you quoted was supposed to be very silly and self-deprecating, but I wrote it very poorly, and it actually wound up sounding kinda judgmental. Oops, sorry. I just rewrote it. I agree with everything you wrote in this comment.

mark-xu on My AI Model Delta Compared To Christiano

I don’t think Paul thinks verification is generally easy or that delegation is fundamentally viable. He, for example, doesn’t suck at hiring because he thinks it’s in fact a hard problem to verify if someone is good at their job.

I liked Rohins comment elsewhere on this general thread.

I’m happy to answer more specific questions, although provide would generally feel more comfortable answering questions about my views then about Paul’s.

linda-linsefors on [Intuitive self-models] 1. Preliminaries

I tried it and it works for me too.

For me the dancer was spinning contraclockwise and would not change. With your screwing trick I could change rotation, and where now stably stuck in the clockwise direction. Until I screwed in the other direction. I've now done this back and forth a few times.

paradiddle on [Intuitive self-models] 1. Preliminaries

Section 1.6 is another appendix about how this series relates to Philosophy Of Mind. My opinion of Philosophy Of Mind is: I’m against it! Or rather, I’ll say plenty in this series that would be highly relevant to understanding the true nature of consciousness, free will, and so on, but the series itself is firmly restricted in scope to questions that can be resolved within the physical universe (including physics, neuroscience, algorithms, and so on). I’ll leave the philosophy to the philosophers.

At the risk of outing myself as a thin-skinned philosopher, I want to push back on this a bit. If we are taking "philosophy of mind" to mean, "the kind of work philosophers of mind do" (which I think we should), then your comment seems misplaced. Crucially, one need not be defending particular views on "big questions" about the true nature of consciousness, free will, and so on to be doing philosophy of mind. Rather, much of the work philosophers of mind do is continuous with scientific inquiry. Indeed, I would say some philosophy of mind is close to indistinguishable from what you do in this post! For example, lots of this work involves trying to carve up conceptual space in a way that coheres with empirical findings, suggests avenues for further research, and renders fruitful discussion easier. Your section 1.3 in this post features exactly the kind of conceptual work that is the bread-and-butter of philosophy. So, far from leaving philosophy to the philosophers, I actually think your work would fit comfortably into the more empirically informed end of contemporary philosophy of mind. To end on a positive note, I think it's really clearly written, fascinating, and fun to read. So thanks!

bokov-1 on My simple AGI investment & insurance strategy

I'm trying out this strategy on Investopedia's simulator (https://www.investopedia.com/simulator/trade/options)

The January 15 2027 call options on QQQ look like this as of posting (current price 481.48):

Strike	Black-Scholes	Ask
485	64.244	77.4
500	57.796	69.83
...	...	...
675	14.308	14
680	13.693	13.5
685	13.077	12.49
...	...	...
700	11.446	10.5
...	...	...
720	9.702	8.5

So, if you were following this strategy and buying today, would you buy 485 because it has the lowest OOM strike price? Would you buy 675 because it's the lowest strike price where the ask is lower than the theoretical Black-Sholes fair price? Would you go for 720 because it's the cheapest available? Would you look for the out-of-money option with the largest difference between Black-Sholes and the ask?

What would be your thought process? I'm definitely hoping to hear from @lc but am interested in hearing from anybody who found this line of reasoning worth investigating and has opinions about it.

dagon on What you know when you know nothing

I think this is mixing up colloquial "know nothing" and literal "know nothing". It's impossible to identify a thing about which one knows nothing, as that identification is something about the thing. It can be wrong, and it can be very imprecise, but it's not nothing.

50/50 are the odds of A when we know nothing about A.

No. 50/50 is a reasonable universal prior, but that's both very theoretical and deeply unclear how to categorize quantum waveforms into things over which a probability is even applicable. In most real cases, 50/50 are the odds to start with when all you know is that it's common enough to come to your attention, and that it "feels" balanced whether or not it'll happen.

In other words, "undefined and inapplicable" is the probability for things you know nothing about. Almost all things you can apply probability to, you know SOMETHING about.

You add another layer of mixing literal and figurative "don't know anything" to the term "singularity". Also, don't forget to multiply by the probability that a singularity-on-relevant-factors might not have happened for the thing you're predicting.

dacyn on Pronouns are Annoying

No, because John could be speaking about himself administering the medication.

If it's about John administering the medication then you'd have to say "... he refused to let him".

It’s also possible to refuse to do something you’ve already acknowledged you should do, so the 3rd he could still be John regardless of who is being told what.

But the sentence did not claim John merely acknowledged that he should administer the medication, it claimed John was the originator of that statement. Is John supposed to be refusing his own requests?

cubefox on What you know when you know nothing

If we know nothing about them, the statements could equally be true or false, and positively or negatively dependent. The same argument which makes us assume 50% probability to individual statements would also make us assume independence between statements. The possibilities cancel out, so to speak.

jessica-liu-taylor on The Obliqueness Thesis

Computationally tractable is Yudkowsky's framing and might be too limited. The kind of thing I believe is for example, an animal without a certain brain complexity will tend not to be a social animal and is therefore unlikely to have the sort of values social animals have. And animals that can't do math aren't going to value mathematical aesthetics the way human mathematicians do.

jessica-liu-taylor on The Obliqueness Thesis

Relativity to Newtonian mechanics is a warp in a straightforward sense. If you believe the layout of a house consists of some rooms connected in a certain way, but there are actually more rooms connected in different ways, getting the maps to line up looks like a warp. Basically, the closer the mapping is to a true homomorphism (in the universal algebra sense), the less warping there is, otherwise there are deviations intuitively analogous to space warps.