LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Gradient Descent on the Human Brain
Jozdien · 2024-04-01T22:39:24.862Z · comments (5)

[link] Towards Guaranteed Safe AI: A Framework for Ensuring Robust and Reliable AI Systems
Gunnar_Zarncke · 2024-05-16T13:09:39.265Z · comments (20)

[link] Anthropic's updated Responsible Scaling Policy
Zac Hatfield-Dodds (zac-hatfield-dodds) · 2024-10-15T16:46:48.727Z · comments (3)

Will 2024 be very hot? Should we be worried?
A.H. (AlfredHarwood) · 2023-12-29T11:22:50.200Z · comments (12)

Toy models of AI control for concentrated catastrophe prevention
Fabien Roger (Fabien) · 2024-02-06T01:38:19.865Z · comments (2)

Claude Sonnet 3.5.1 and Haiku 3.5
Zvi · 2024-10-24T14:50:06.286Z · comments (9)

[link] how birds sense magnetic fields
bhauth · 2024-06-27T18:59:35.075Z · comments (4)

Rewilding the Gut VS the Autoimmune Epidemic
GGD · 2024-08-16T18:00:46.239Z · comments (0)

[link] The Good Balsamic Vinegar
jenn (pixx) · 2024-01-26T19:30:57.435Z · comments (4)

Applying refusal-vector ablation to a Llama 3 70B agent
Simon Lermen (dalasnoin) · 2024-05-11T00:08:08.117Z · comments (14)

On OpenAI’s Preparedness Framework
Zvi · 2023-12-21T14:00:05.144Z · comments (4)

[link] Prices are Bounties
Maxwell Tabarrok (maxwell-tabarrok) · 2024-10-12T14:51:40.689Z · comments (13)

On Lex Fridman’s Second Podcast with Altman
Zvi · 2024-03-25T12:20:08.780Z · comments (10)

Does literacy remove your ability to be a bard as good as Homer?
Adrià Garriga-alonso (rhaps0dy) · 2024-01-18T03:43:14.994Z · comments (19)

Cooperating with aliens and AGIs: An ECL explainer
Chi Nguyen · 2024-02-24T22:58:47.345Z · comments (8)

D&D.Sci Alchemy: Archmage Anachronos and the Supply Chain Issues Evaluation & Ruleset
aphyer · 2024-06-17T21:29:08.778Z · comments (11)

[link] Bed Time Quests & Dinner Games for 3-5 year olds
Gunnar_Zarncke · 2024-06-22T07:53:38.989Z · comments (0)

Provably Safe AI: Worldview and Projects
bgold · 2024-08-09T23:21:02.763Z · comments (43)

Model evals for dangerous capabilities
Zach Stein-Perlman · 2024-09-23T11:00:00.866Z · comments (11)

Book Review: Righteous Victims - A History of the Zionist-Arab Conflict
Yair Halberstadt (yair-halberstadt) · 2024-06-24T11:02:03.490Z · comments (8)

Llama Llama-3-405B?
Zvi · 2024-07-24T19:40:07.565Z · comments (9)

Observations on Teaching for Four Weeks
ClareChiaraVincent · 2024-05-06T16:55:59.315Z · comments (14)

Applications of Chaos: Saying No (with Hastings Greer)
Elizabeth (pktechgirl) · 2024-09-21T16:30:07.415Z · comments (16)

Vipassana Meditation and Active Inference: A Framework for Understanding Suffering and its Cessation
Benjamin Sturgeon (benjamin-sturgeon) · 2024-03-21T12:32:22.475Z · comments (8)

Sherlockian Abduction Master List
Cole Wyeth (Amyr) · 2024-07-11T20:27:00.000Z · comments (63)

Transfer learning and generalization-qua-capability in Babbage and Davinci (or, why division is better than Spanish)
RP (Complex Bubble Tea) · 2024-02-09T07:00:45.825Z · comments (6)

[link] Announcing Human-aligned AI Summer School
Jan_Kulveit · 2024-05-22T08:55:10.839Z · comments (0)

Paper in Science: Managing extreme AI risks amid rapid progress
JanB (JanBrauner) · 2024-05-23T08:40:40.678Z · comments (2)

Unlearning via RMU is mostly shallow
Andy Arditi (andy-arditi) · 2024-07-23T16:07:52.223Z · comments (3)

Scenario Forecasting Workshop: Materials and Learnings
elifland · 2024-03-08T02:30:46.517Z · comments (3)

[link] Finding Backward Chaining Circuits in Transformers Trained on Tree Search
abhayesian · 2024-05-28T05:29:46.777Z · comments (1)

AI #82: The Governor Ponders
Zvi · 2024-09-19T13:30:04.863Z · comments (8)

Apply to the Conceptual Boundaries Workshop for AI Safety
Chipmonk · 2023-11-27T21:04:59.037Z · comments (0)

So you want to work on technical AI safety
gw · 2024-06-24T14:29:57.481Z · comments (3)

AI #52: Oops
Zvi · 2024-02-22T21:50:07.393Z · comments (9)

Gemini 1.0
Zvi · 2023-12-07T14:40:05.243Z · comments (7)

Why you should learn a musical instrument
cata · 2024-05-15T20:36:16.034Z · comments (23)

[link] on the dollar-yen exchange rate
bhauth · 2024-04-07T04:49:53.920Z · comments (21)

Consent across power differentials
Ramana Kumar (ramana-kumar) · 2024-07-09T11:42:03.177Z · comments (12)

[link] Can AI Outpredict Humans? Results From Metaculus's Q3 AI Forecasting Benchmark
ChristianWilliams · 2024-10-10T18:58:46.041Z · comments (2)

Goal-Completeness is like Turing-Completeness for AGI
Liron · 2023-12-19T18:12:29.947Z · comments (26)

The Shortest Path Between Scylla and Charybdis
Thane Ruthenis · 2023-12-18T20:08:34.995Z · comments (8)

n of m ring signatures
DanielFilan · 2023-12-04T20:00:06.580Z · comments (7)

[link] A starter guide for evals
Marius Hobbhahn (marius-hobbhahn) · 2024-01-08T18:24:23.913Z · comments (2)

[Intuitive self-models] 7. Hearing Voices, and Other Hallucinations
Steven Byrnes (steve2152) · 2024-10-29T13:36:16.325Z · comments (2)

On Complexity Science
Garrett Baker (D0TheMath) · 2024-04-05T02:24:32.039Z · comments (19)

Changes in College Admissions
Zvi · 2024-04-24T13:50:03.487Z · comments (11)

Metastatic Cancer Treatment Since 2010: The Success Stories
sarahconstantin · 2024-11-04T22:50:09.386Z · comments (2)

Interoperable High Level Structures: Early Thoughts on Adjectives
johnswentworth · 2024-08-22T21:12:38.223Z · comments (1)

An issue with training schemers with supervised fine-tuning
Fabien Roger (Fabien) · 2024-06-27T15:37:56.020Z · comments (12)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

avturchin on Magic by forgetting

It will work only if I care for my observations, something like EDT.

christiankl on (Salt) Water Gargling as an Antiviral

I assume salt water has lower side effects, so that seemed like a promising thing to check.

Why do you make that assumption? Besides the antiviral effect of it, I would expect salt water to drain H_2O from the oral mucosa. Do you think the effect is too small to matter? Do you think it's a desirable effect?

knight-lee on A better “Statement on AI Risk?”

This is an important point. AI alignment/safety organizations take money as input and write very abstract papers as their output, which usually have no immediate applications. I agree it may appear very unproductive.

However, if we think from first principles, a lot of other things are like that. For instance, when you go to school, you study the works of Shakespeare, you learn to play the guitar, and you learn how Spanish pronouns work. These things appear to be a complete waste of time. If 50 million students in the US spend 1 hour a day on these kinds of activities, and each hour is valued at only $10, that's $180 billion/year.

But we know these things are not a waste of time, because in hindsight, when you study how students grow up, this work somehow helps them later in life.

Lots of things appear useless, but are valuable for reasons beyond the intuitive set of reasons we evolved to understand.

Studying the nucleus of atoms might appear like a useless curiosity, if you didn't know it'll lead to nuclear energy. There are no real world applications for a long time but suddenly there are enormous applications.

Pasteur's studies on fermentation might appear limited to modest winemaking improvements, but it led to the discovery of germ theory which saved countless lives.

The stone age people studying weird rocks may have discovered obsidian and copper. Those who studied the strange seeds that plants produce may have discovered agriculture.

We don't know how valuable this alignment work is. We should cope with this uncertainty probabilistically: if there is a 50% chance it will help us, the benefits per cost is halved, but that doesn't reduce ideal spending to zero.

dr_s on Cost, Not Sacrifice

I think it's a very visible example that right now is particularly often brought up. I'm not saying it's all there is to it but I think the fundamental visceral reaction to the very idea of self-mutilation is an important and often overlooked element of why some people would be put off by the concept. I actually think it's something that makes the whole thing a lot more understandable in what it comes from than the generic "well they're just bigoted and evil" stuff people come up with in extremely partisan arguments on the topics. These sort of psychological processes - the fact that we may first have a gut-level reaction, and only later rationalize it by constructing an ideological framework to justify why the things that repulses us are evil - are very well documented, and happen all over the place. Does not mean everyone who disagrees with me does so because of it (nor that everyone who agrees doesn't do it!) but it would be foolish to just pretend this never happens because it sounds a bit offensive to bring up in a debate. The entire concept of rationality is based around the awareness that yeah, we're constantly affected by cognitive biases like these, and separating the wheat from the chaff is hard work.

And by the way it's an excellent example of the reverse too. Just like people who are not dysphoric are put off by mutilation, people who are are put off by the feeling of having something grafted onto their bodies that doesn't belong. Which is sort of the flip side of it. Essentially we tend to have a mental image of our bodies and a strong aversion to that shape being altered or disturbed in some way (which makes all kinds of sense evolutionarily, really). Ironically enough, it's probably via the mechanism of empathy that someone can see someone else do something to their body that feels "wrong" and cringe/be grossed out on their behalf (if you think trans issues are controversial, consider the reactions some people can have even to things like piercings in particularly sensitive places).

charlie-steiner on Are You More Real If You're Really Forgetful?

Fair enough.

Yes, it seems totally reasonable for bounded reasoners to consider hypotheses (where a hypothesis like 'the universe is as it would be from the perspective of prisoner #3' functions like treating prisoner #3 as 'an instance of me') that would be counterfactual or even counterlogical for more idealized reasoners.

Typical bounded reasoning weirdness is stuff like seeming to take some counterlogicals (e.g. different hypotheses about the trillionth digit of pi) seriously despite denying 1+1=3, even though there's a chain of logic connecting one to the other. Projecting this into anthropics, you might have a certain systematic bias about which hypotheses you can consider, and yet deny that that systematic bias is valid when presented with it abstractly.

This seems like it makes drawing general lessons about what counts as 'an instance of me' from the fact that I'm a bounded reasoner pretty fraught.

j-bostock on Yonatan Cale's Shortform

I volunteer to play Minecraft with the LLM agents. I think this might be one eval where the human evaluators are easy to come by.

mitchell_porter on Why We Wouldn't Build Aligned AI Even If We Could

For my part, I have been wondering this week, what a constructive reply to this would be.

I think your proposed imperatives and experiments are quite good. I hope that they are noticed and thought about. I don't think they are sufficient for correctly aligning a superintelligence, but they can be part of the process that gets us there.

That's probably the most important thing for me to say. Anything else is just a disagreement about the nature of the world as it is now, and isn't as important.

megasilverfist on The Copernican Revolution from the Inside

Galileo invented the telescope

https://explainingscience.org/2018/03/13/galileo-and-the-telescope/ a bit of a simplification, but not seriously off.

yonatan-cale-1 on Yonatan Cale's Shortform

I think a simple bash tool running as admin could do most of these:

it can get any info on a computer into its context whenever it wants, and it can choose to invoke any computer functionality that a human could invoke, and it can store and retrieve knowledge for itself at will

Regarding

and its training includes the use of those functionalities

I think this isn't a crux because the scaffolding I'd build wouldn't train the model. But as a secondary point, I think today's models can already use bash tools reasonably well.

it's not completely clear to me that it wouldn't already be able to do a slow self-improvement takeoff by itself

This requires skill in ML R&D which I think is almost entirely not blocked by what I'd build, but I do think it might be reasonable to have my tool not work for ML R&D because of this concern. (would require it to be closed source and so on)

Thanks for raising concerns, I'm happy for more if you have them

ya-polkovnik on Helpful examples to get a sense of modern automated manipulation

I remember how anti-war was a pro-Ukrainian position. Now, to be that kind of "antiwar", you must be for military escalation to destroy Russia, call up "kill the orcs", and Russia is trying to start the negotiations.

Well, to be fair, it was like that since the beginning, and pacifism was just a shiny cover.

By the way, I'm not pro-Russian. Srsly. They don't genuinely seek, and wouldn't ever achieve denazification, even though it's very important to do.