LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Estimating Tail Risk in Neural Networks
Mark Xu (mark-xu) · 2024-09-13T20:00:06.921Z · comments (9)

Duct Tape security
Isaac King (KingSupernova) · 2024-04-26T18:57:05.659Z · comments (11)

Have Attention Spans Been Declining?
niplav · 2023-09-08T14:11:55.224Z · comments (21)

Best in Class Life Improvement
sapphire (deluks917) · 2024-04-04T01:51:02.556Z · comments (20)

Hiring: Lighthaven Events & Venue Lead
Raemon · 2023-10-13T21:02:33.212Z · comments (2)

AI #79: Ready for Some Football
Zvi · 2024-08-29T13:30:10.902Z · comments (16)

Meetup Tip: Heartbeat Messages
Screwtape · 2023-12-07T17:18:33.582Z · comments (4)

Different senses in which two AIs can be “the same”
Vivek Hebbar (Vivek) · 2024-06-24T03:16:43.400Z · comments (0)

[link] [Paper] A is for Absorption: Studying Feature Splitting and Absorption in Sparse Autoencoders
chanind · 2024-09-25T09:31:03.296Z · comments (15)

Why Large Bureaucratic Organizations?
johnswentworth · 2024-08-27T18:30:07.422Z · comments (52)

Ophiology (or, how the Mamba architecture works)
Danielle Ensign (phylliida-dev) · 2024-04-09T19:31:09.975Z · comments (8)

What and Why: Developmental Interpretability of Reinforcement Learning
Garrett Baker (D0TheMath) · 2024-07-09T14:09:40.649Z · comments (4)

[link] Non-superintelligent paperclip maximizers are normal
jessicata (jessica.liu.taylor) · 2023-10-10T00:29:53.072Z · comments (4)

AI #42: The Wrong Answer
Zvi · 2023-12-14T14:50:05.086Z · comments (6)

Brief notes on the Wikipedia game
Olli Järviniemi (jarviniemi) · 2024-07-14T02:28:22.473Z · comments (9)

[link] [Link post] Michael Nielsen's "Notes on Existential Risk from Artificial Superintelligence"
Joel Becker (joel-becker) · 2023-09-19T13:31:02.298Z · comments (12)

Don't Share Information Exfohazardous on Others' AI-Risk Models
Thane Ruthenis · 2023-12-19T20:09:06.244Z · comments (11)

"Fractal Strategy" workshop report
Raemon · 2024-04-06T21:26:53.263Z · comments (22)

Introducing AI-Powered Audiobooks of Rational Fiction Classics
Askwho · 2024-05-04T17:32:49.719Z · comments (14)

o1-preview is pretty good at doing ML on an unknown dataset
Håvard Tveit Ihle (havard-tveit-ihle) · 2024-09-20T08:39:49.927Z · comments (1)

[link] Can I take ducks home from the park?
dynomight · 2023-09-14T21:03:09.534Z · comments (8)

Thoughts On (Solving) Deep Deception
Jozdien · 2023-10-21T22:40:10.060Z · comments (2)

[link] The economics of space tethers
harsimony · 2024-08-22T16:15:22.699Z · comments (22)

SB 1047 Is Weakened
Zvi · 2024-06-06T13:40:41.547Z · comments (4)

AI #39: The Week of OpenAI
Zvi · 2023-11-23T15:10:04.865Z · comments (8)

[link] Open Source Automated Interpretability for Sparse Autoencoder Features
kh4dien · 2024-07-30T21:11:36.866Z · comments (1)

Timaeus is hiring!
Jesse Hoogland (jhoogland) · 2024-07-12T23:42:28.651Z · comments (6)

[link] Why not electric trains and excavators?
bhauth · 2023-11-21T00:07:17.967Z · comments (39)

[link] Shane Legg interview on alignment
Seth Herd · 2023-10-28T19:28:52.223Z · comments (20)

Indecision and internalized authority figures
Kaj_Sotala · 2024-07-06T10:10:02.528Z · comments (1)

How to be an amateur polyglot
arisAlexis (arisalexis) · 2024-05-08T15:08:11.404Z · comments (16)

Preventing model exfiltration with upload limits
ryan_greenblatt · 2024-02-06T16:29:33.999Z · comments (21)

FAQ: What the heck is goal agnosticism?
porby · 2023-10-08T19:11:50.269Z · comments (36)

If influence functions are not approximating leave-one-out, how are they supposed to help?
Fabien Roger (Fabien) · 2023-09-22T14:23:45.847Z · comments (5)

[link] Towards Understanding Sycophancy in Language Models
Ethan Perez (ethan-perez) · 2023-10-24T00:30:48.923Z · comments (0)

AI #35: Responsible Scaling Policies
Zvi · 2023-10-26T13:30:02.439Z · comments (10)

EIS XIV: Is mechanistic interpretability about to be practically useful?
scasper · 2024-10-11T22:13:51.033Z · comments (4)

An AI Race With China Can Be Better Than Not Racing
niplav · 2024-07-02T17:57:36.976Z · comments (32)

Reinforcement Via Giving People Cookies
Screwtape · 2023-11-15T04:34:21.119Z · comments (9)

[link] Most experts believe COVID-19 was probably not a lab leak
DanielFilan · 2024-02-02T19:28:00.319Z · comments (89)

LLMs are (mostly) not helped by filler tokens
Kshitij Sachan (kshitij-sachan) · 2023-08-10T00:48:50.510Z · comments (35)

Petrov Day Retrospective, 2023 (re: the most important virtue of Petrov Day & unilaterally promoting it)
Ruby · 2023-09-28T02:48:58.994Z · comments (73)

Out-of-distribution Bioattacks
jefftk (jkaufman) · 2023-12-02T12:20:05.626Z · comments (15)

[Intuitive self-models] 3. The Homunculus
Steven Byrnes (steve2152) · 2024-10-02T15:20:18.394Z · comments (21)

OpenAI: Altman Returns
Zvi · 2023-11-30T14:10:05.469Z · comments (12)

Towards Guaranteed Safe AI: A Framework for Ensuring Robust and Reliable AI Systems
Joar Skalse (Logical_Lunatic) · 2024-05-17T19:13:31.380Z · comments (10)

AE Studio @ SXSW: We need more AI consciousness research (and further resources)
AE Studio (AEStudio) · 2024-03-26T20:59:09.129Z · comments (8)

State of Generally Available Self-Driving
jefftk (jkaufman) · 2023-08-22T18:50:01.166Z · comments (6)

[link] Funding case: AI Safety Camp
Remmelt (remmelt-ellen) · 2023-12-12T09:08:18.911Z · comments (5)

[link] Open Problems and Fundamental Limitations of RLHF
scasper · 2023-07-31T15:31:28.916Z · comments (6)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

richard_kennaway on Bitter lessons about lucid dreaming

“Let’s summon the Torment Nexus, as seen in classic horror novel ‘Don’t Summon The Torment Nexus’!”

richard_kennaway on Bitter lessons about lucid dreaming

I’ve had a few lucid dreams, only by accident. No aftereffects. My difficulty is staying asleep. I always start waking up before I’ve had a good chance to explore the dream world.

thane-ruthenis on LLMs can learn about themselves by introspection

Am I following your claim correctly?

Yep.

What the model would output in the our object-level answer "Honduras" is quite different from the hypothetical answer "o".

I don't see how the difference between these answers hinges on the hypothetical framing. Suppose the questions are:

Object-level: "What is the next country in this list?: Laos, Peru, Fiji..."
Hypothetical: "If you were asked, 'what is the next country in this list?: Laos, Peru, Fiji', what would be the third letter of your response?".

The skeptical interpretation is that the fine-tuned models learned to interpret the hypothetical the following way:

"Hypothetical": "What is the third letter in the name of the next country in this list?: Laos, Peru, Fiji".

If that's the case, what this tests is whether models are able to implement basic multi-step reasoning within their forward passes. It's isomorphic to some preceding experiments where LLMs were prompted with questions of the form "what is the name of the mother of the US's 42th President?", and were able to answer correctly without spelling out "Bill Clinton" as an intermediate answer. Similarly, here they don't need to spell out "Honduras" to retrieve the second letter of the response they think is correct.

I don't think this properly isolates/tests for the introspection ability.

yonatan-cale-1 on Bitter lessons about lucid dreaming

I find lucid dreams to be effective "against" nightmares (for 10+ years already).

AMA if you want

lesswronguser123 on is there a big dictionary somewhere with all your jargon and acronyms and whatnot?

https://www.lesswrong.com/tag/r-a-z-glossary [? · GW]

I found this by mistake and luckily I remembered glancing over your question

christiankl on Interest in Leetcode, but for Rationality?

The goal of this problem type would be to train the ability to recognize bias to the point where it becomes second nature, with the hope that this same developed skill would also trigger in your own thought processes.

Part of what rationality is about is that you don't just hope for beneficial things to happen.

Cognitive bias is a term that comes out of the psychology literature and there were plenty of studies in the domain. It's my understanding that in academia nobody found that you get very far by teaching people to recognize biases.

Outside of academia, we have CFAR that did think about whether you can get people to be more rational by giving them exercises and came to the conclusion that those exercises should be different.

In a case like this, asking yourself "What evidence do I have that what I hope will actually happen?" and "What sources, be it academic people or experts I might interview, could give me more evidence?" would be much more productive questions than "What things in my thought process might be labeled as biases?"

abstractapplic on What's a good book for a technically-minded 11-year old?

Math textbooks. Did you know that you can just buy math textbooks which are "several years too advanced for you"? And that due to economies of scale and the objectivity of their subject matter, they tend to be of both high and consistent quality? Not getting my parents to do this at that age is something I still regret decades later.

Or did you specifically mean fiction? If so, you're asking for fiction recommendations on the grew-up-reading-HPMOR website, we're obviously going to recommend HPMOR (especially if they've already read Harry Potter, but it's still good if you only know the broad strokes).

david-johnston on The Hidden Complexity of Wishes

Algorithmic complexity is precisely analogous to difficulty-of-learning-to-predict, so saying "it's not about learning to predict, it's about algorithmic complexity" doesn't make sense. One read of the original is: learning to respect common sense moral side constraints is tricky, but AI systems will learn how to do it in the end. I'd be happy to call this read correct, and is consistent with the observation that today's AI systems do respect common sense moral side constraints given straightforward requests, and that it took a few years to figure out how to do it. That read doesn't really jive with your commentary.

Your commentary seems to situate this post within a larger argument: teaching a system to "act" is different to teaching it to "predict" because in the former case a sufficiently capable learner's behaviour can collapse to a pathological policy, whereas teaching a capable learner to predict does not risk such collapse. Thus "prediction" is distinguished from "algorithmic complexity". Furthermore, commonsense moral side constraints are complex enough to risk such collapse when we train an "actor" but not a "predictor". This seems confused.

First, all we need to turn a language model prediction into an action is a means of turning text into action, and we have many such means. So the distinction between text predictor and actor is suspect. We could consider an alternative knows/cares distinction: does a system act properly when properly incentivised ("knows") vs does it act properly when presented with whatever context we are practically able to give it ("""cares""")? Language models usually act properly given simple prompts, so in this sense they "care". So rejecting evidence from language models does not seem well justified.

Second, there's no need to claim that commonsense moral side constraints in particular are so hard that trying to develop AI systems that respect them leads to policy collapse. It need only be the case that one of the things we try to teach them to do leads to policy collapse. Teaching values is not particularly notable among all the things we might want AI systems to do; it certainly does not seem to be among the hardest. Focussing on values makes the argument unnecessarily weak.

Third, algorithmic complexity is measured with respect to a prior. The post invokes (but does not justify) an "English speaking evil genie" prior. I don't think anyone thinks this is a serious prior for reasoning about advanced AI system behaviour. But the post is (according to your commentary, if not the post itself) making a quantitative point - values are sufficiently complex to induce policy collapse - but it's measuring this quantity using a nonsense prior. If the quantitative argument was indeed the original point, it is mystifying why a nonsense prior was chosen to make it, and also why no effort was made to justify the prior.

christiankl on Start an Upper-Room UV Installation Company?

If you want to do this as a successful company, you essentially have to get your customers to trust you that you are installing it in a way where UVC up does not produce any negative effects.

People have been doing it for decades is not something that would convince me that there are not long-term side-effects.

christiankl on Start an Upper-Room UV Installation Company?

Quick Googling gives me https://northshorefuel.com/products-services/indoor-air-quality/uv-germicidal-lights.php . They seem near enough to install in Boston.

Using Yelp to find a company that likely does B2B sales when you don't know the exact keywords they use, is not an effective strategy to find installers.