LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Do Not Mess With Scarlett Johansson
Zvi · 2024-05-22T15:10:03.215Z · comments (7)

[link] The Perceptron Controversy
Yuxi_Liu · 2024-01-10T23:07:23.341Z · comments (18)

Fear of centralized power vs. fear of misaligned AGI: Vitalik Buterin on 80,000 Hours
Seth Herd · 2024-08-05T15:38:09.682Z · comments (22)

How a chip is designed
YM (Yannick_Muehlhaeuser_duplicate0.05902100825326273) · 2024-06-28T08:04:27.392Z · comments (4)

AI #69: Nice
Zvi · 2024-06-20T12:40:02.566Z · comments (9)

Analyzing DeepMind's Probabilistic Methods for Evaluating Agent Capabilities
Axel Højmark (hojmax) · 2024-07-22T16:17:07.665Z · comments (0)

[link] How LDT helps reduce the AI arms race
Tamsin Leake (carado-1) · 2023-12-10T16:21:44.409Z · comments (13)

Interpreting and Steering Features in Images
Gytis Daujotas (gytis-daujotas) · 2024-06-20T18:33:59.512Z · comments (6)

[link] Static Analysis As A Lifestyle
adamShimi · 2024-07-03T18:29:37.384Z · comments (11)

[link] Sabotage Evaluations for Frontier Models
David Duvenaud (david-duvenaud) · 2024-10-18T22:33:14.320Z · comments (0)

[link] AI Safety Hub Serbia Soft Launch
DusanDNesic · 2023-10-20T07:11:48.389Z · comments (1)

Advice to junior AI governance researchers
Akash (akash-wasil) · 2024-07-08T19:19:07.316Z · comments (1)

Superposition is not "just" neuron polysemanticity
LawrenceC (LawChan) · 2024-04-26T23:22:06.066Z · comments (4)

Book Review: On the Edge: The Fundamentals
Zvi · 2024-09-23T13:40:11.058Z · comments (3)

On the Debate Between Jezos and Leahy
Zvi · 2024-02-06T14:40:05.487Z · comments (6)

A to Z of things
KatjaGrace · 2023-11-17T05:20:03.134Z · comments (6)

[link] DeepMind: Frontier Safety Framework
Zach Stein-Perlman · 2024-05-17T17:30:02.504Z · comments (0)

On the Gladstone Report
Zvi · 2024-03-20T19:50:05.186Z · comments (11)

Complex systems research as a field (and its relevance to AI Alignment)
Nora_Ammann · 2023-12-01T22:10:25.801Z · comments (11)

[link] A free to enter, 240 character, open-source iterated prisoner's dilemma tournament
Isaac King (KingSupernova) · 2023-11-09T08:24:43.277Z · comments (19)

How to Control an LLM's Behavior (why my P(DOOM) went down)
RogerDearnaley (roger-d-1) · 2023-11-28T19:56:49.679Z · comments (30)

[Interim research report] Activation plateaus & sensitive directions in GPT2
StefanHex (Stefan42) · 2024-07-05T17:05:25.631Z · comments (2)

Announcing New Beginner-friendly Book on AI Safety and Risk
Darren McKee · 2023-11-25T15:57:08.078Z · comments (2)

What mistakes has the AI safety movement made?
EuanMcLean (euanmclean) · 2024-05-23T11:19:02.717Z · comments (29)

[link] A primer on why computational predictive toxicology is hard
Abhishaike Mahajan (abhishaike-mahajan) · 2024-08-19T17:16:37.735Z · comments (2)

Against most, but not all, AI risk analogies
Matthew Barnett (matthew-barnett) · 2024-01-14T03:36:16.267Z · comments (41)

Bayesian updating in real life is mostly about understanding your hypotheses
Max H (Maxc) · 2024-01-01T00:10:30.978Z · comments (4)

Self-Awareness: Taxonomy and eval suite proposal
Daniel Kokotajlo (daniel-kokotajlo) · 2024-02-17T01:47:01.802Z · comments (2)

A gentle introduction to mechanistic anomaly detection
Erik Jenner (ejenner) · 2024-04-03T23:06:16.778Z · comments (0)

[link] Improving Dictionary Learning with Gated Sparse Autoencoders
Senthooran Rajamanoharan (SenR) · 2024-04-25T18:43:47.003Z · comments (38)

[link] Investigating an insurance-for-AI startup
L Rudolf L (LRudL) · 2024-09-21T15:29:10.083Z · comments (0)

Generalization, from thermodynamics to statistical physics
Jesse Hoogland (jhoogland) · 2023-11-30T21:28:50.089Z · comments (9)

AiPhone
Zvi · 2024-06-12T22:20:02.141Z · comments (4)

[link] Moving on from community living
Vika · 2024-04-17T17:02:11.357Z · comments (7)

Another argument against utility-centric alignment paradigms
Fiora from Rosebloom · 2024-09-22T07:28:27.856Z · comments (39)

[question] Is cybercrime really costing trillions per year?
Fabien Roger (Fabien) · 2024-09-27T08:44:07.621Z · answers+comments (28)

Never Drop A Ball
Screwtape · 2023-11-23T04:15:35.834Z · comments (1)

AI research assistants competition 2024Q3: Tie between Elicit and You.com
Elizabeth (pktechgirl) · 2024-10-12T15:10:05.417Z · comments (2)

[link] Superforecasting the Origins of the Covid-19 Pandemic
DanielFilan · 2024-03-12T19:01:15.914Z · comments (0)

[link] Twitter thread on AI safety evals
Richard_Ngo (ricraz) · 2024-07-31T00:18:14.076Z · comments (3)

Thoughts on open source AI
Sam Marks (samuel-marks) · 2023-11-03T15:35:42.067Z · comments (17)

[link] Ice: The Penultimate Frontier
Roko · 2024-07-13T23:44:56.827Z · comments (56)

AI #55: Keep Clauding Along
Zvi · 2024-03-14T15:40:09.335Z · comments (16)

On Llama-3 and Dwarkesh Patel’s Podcast with Zuckerberg
Zvi · 2024-04-22T13:10:02.645Z · comments (4)

Catastrophic Goodhart in RL with KL penalty
Thomas Kwa (thomas-kwa) · 2024-05-15T00:58:20.763Z · comments (10)

All About Concave and Convex Agents
mako yass (MakoYass) · 2024-03-24T21:37:17.922Z · comments (23)

Do not delete your misaligned AGI.
mako yass (MakoYass) · 2024-03-24T21:37:07.724Z · comments (13)

Book Review: On the Edge: The Future
Zvi · 2024-09-27T14:00:05.279Z · comments (1)

What is a Tool?
johnswentworth · 2024-06-25T23:40:07.483Z · comments (4)

A framework for thinking about AI power-seeking
Joe Carlsmith (joekc) · 2024-07-24T22:41:01.685Z · comments (15)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

lesswronguser123 on is there a big dictionary somewhere with all your jargon and acronyms and whatnot?

https://www.lesswrong.com/tag/r-a-z-glossary [? · GW]

I found this by mistake and luckily I remembered glancing over your question

christiankl on Interest in Leetcode, but for Rationality?

The goal of this problem type would be to train the ability to recognize bias to the point where it becomes second nature, with the hope that this same developed skill would also trigger in your own thought processes.

Part of what rationality is about is that you don't just hope for beneficial things to happen.

Cognitive bias is a term that comes out of the psychology literature and there were plenty of studies in the domain. It's my understanding that in academia nobody found that you get very far by teaching people to recognize biases.

Outside of academia, we have CFAR that did think about whether you can get people to be more rational by giving them exercises and came to the conclusion that those exercises should be different.

In a case like this, asking yourself "What evidence do I have that what I hope will actually happen?" and "What sources, be it academic people or experts I might interview, could give me more evidence?" would be much more productive questions than "What things in my thought process might be labeled as biases?"

abstractapplic on What's a good book for a technically-minded 11-year old?

Math textbooks. Did you know that you can just buy math textbooks which are "several years too advanced for you"? And that due to economies of scale and the objectivity of their subject matter, they tend to be of both high and consistent quality? Not getting my parents to do this at that age is something I still regret decades later.

Or did you specifically mean fiction? If so, you're asking for fiction recommendations on the grew-up-reading-HPMOR website, we're obviously going to recommend HPMOR (especially if they've already read Harry Potter, but it's still good if you only know the broad strokes).

david-johnston on The Hidden Complexity of Wishes

Algorithmic complexity is precisely analogous to difficulty-of-learning-to-predict, so saying "it's not about learning to predict, it's about algorithmic complexity" doesn't make sense. One read of the original is: learning to respect common sense moral side constraints is tricky, but AI systems will learn how to do it in the end. I'd be happy to call this read correct, and is consistent with the observation that today's AI systems do respect common sense moral side constraints given straightforward requests, and that it took a few years to figure out how to do it. That read doesn't really jive with your commentary.

Your commentary seems to situate this post within a larger argument: teaching a system to "act" is different to teaching it to "predict" because in the former case a sufficiently capable learner's behaviour can collapse to a pathological policy, whereas teaching a capable learner to predict does not risk such collapse. Thus "prediction" is distinguished from "algorithmic complexity". Furthermore, commonsense moral side constraints are complex enough to risk such collapse when we train an "actor" but not a "predictor". This seems confused.

First, all we need to turn a language model prediction into an action is a means of turning text into action, and we have many such means. So the distinction between text predictor and actor is suspect. We could consider an alternative knows/cares distinction: does a system act properly when properly incentivised ("knows") vs does it act properly when presented with whatever context we are practically able to give it ("""cares""")? Language models usually act properly given simple prompts, so in this sense they "care". So rejecting evidence from language models does not seem well justified.

Second, there's no need to claim that commonsense moral side constraints in particular are so hard that trying to develop AI systems that respect them leads to policy collapse. It need only be the case that one of the things we try to teach them to do leads to policy collapse. Teaching values is not particularly notable among all the things we might want AI systems to do; it certainly does not seem to be among the hardest. Focussing on values makes the argument unnecessarily weak.

Third, algorithmic complexity is measured with respect to a prior. The post invokes (but does not justify) an "English speaking evil genie" prior. I don't think anyone thinks this is a serious prior for reasoning about advanced AI system behaviour. But the post is (according to your commentary, if not the post itself) making a quantitative point - values are sufficiently complex to induce policy collapse - but it's measuring this quantity using a nonsense prior. If the quantitative argument was indeed the original point, it is mystifying why a nonsense prior was chosen to make it, and also why no effort was made to justify the prior.

christiankl on Start an Upper-Room UV Installation Company?

If you want to do this as a successful company, you essentially have to get your customers to trust you that you are installing it in a way where UVC up does not produce any negative effects.

People have been doing it for decades is not something that would convince me that there are not long-term side-effects.

christiankl on Start an Upper-Room UV Installation Company?

Quick Googling gives me https://northshorefuel.com/products-services/indoor-air-quality/uv-germicidal-lights.php . They seem near enough to install in Boston.

Using Yelp to find a company that likely does B2B sales when you don't know the exact keywords they use, is not an effective strategy to find installers.

jkaufman on Start an Upper-Room UV Installation Company?

You do need to pay attention to what paint is on the ceiling and measure to verify that levels are low in the places people are, but pointing UVC up is something we've done safely for a long time in many places.

michael-roe on Bitter lessons about lucid dreaming

Discussing sleep paralysis might be an infohazard…

The times I’ve entered sleep paralysis it hasn’t bothered me, as I knew what it was.

charlie-steiner on Start an Upper-Room UV Installation Company?

Pointing UVC LEDs at your ceiling seems sketchy. White paint will likely scatter ~5% of UVC, and shiny metal surfaces will scatter more. Try to go below 250nm for reduced reflection (and reduced penetration into human skin) and (more) unwanted chemistry will start happening to the air.

I guess an important question is whether UVC is more harmful than UVB. If it's not any more harmful, then as long as nobody's getting sunburned from being in that room all day, it's probably fine - that 5% scattering is just another name for SPF 20. But if it is more harmful, then sunburn might not be an adequate signal for when it's bad for you.

avturchin on Bitter lessons about lucid dreaming

The main risk is entering is sleep paralysis state, which itself is benign, but some terrifying sounds can be heard during it and this can cause stress.

Yes, it is to wake up from lucid dream - juts thing about your slleping body.