LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

[link] Non-alignment project ideas for making transformative AI go well
Lukas Finnveden (Lanrian) · 2024-01-04T07:23:13.658Z · comments (1)

Simplifying Corrigibility – Subagent Corrigibility Is Not Anti-Natural
Rubi J. Hudson (Rubi) · 2024-07-16T22:44:17.128Z · comments (27)

What does davidad want from «boundaries»?
Chipmonk · 2024-02-06T17:45:42.348Z · comments (1)

Announcing Atlas Computing
miyazono · 2024-04-11T15:56:31.241Z · comments (4)

Userscript to always show LW comments in context vs at the top
Vlad Sitalo (harcisis) · 2023-11-21T17:53:30.418Z · comments (8)

[link] Against Student Debt Cancellation From All Sides of the Political Compass
Maxwell Tabarrok (maxwell-tabarrok) · 2024-05-13T14:55:57.525Z · comments (16)

In defense of technological unemployment as the main AI concern
tailcalled · 2024-08-27T17:58:01.992Z · comments (36)

Auditing failures vs concentrated failures
ryan_greenblatt · 2023-12-11T02:47:35.703Z · comments (0)

D&D.Sci Long War: Defender of Data-mocracy Evaluation & Ruleset
aphyer · 2024-05-14T03:35:10.586Z · comments (3)

ProLU: A Nonlinearity for Sparse Autoencoders
Glen Taggart · 2024-04-23T14:09:21.592Z · comments (4)

[question] "Deception Genre" What Books are like Project Lawful?
Double · 2024-08-28T17:19:52.172Z · answers+comments (20)

On the Contrary, Steelmanning Is Normal; ITT-Passing Is Niche
Zack_M_Davis · 2024-01-09T23:12:20.349Z · comments (31)

AI #38: Let’s Make a Deal
Zvi · 2023-11-16T19:50:05.442Z · comments (2)

AI #60: Oh the Humanity
Zvi · 2024-04-18T14:10:02.281Z · comments (7)

When Does Altruism Strengthen Altruism?
jefftk (jkaufman) · 2024-01-21T18:50:05.424Z · comments (2)

Economics Roundup #3
Zvi · 2024-09-10T13:50:06.955Z · comments (9)

Back to Basics: Truth is Unitary
lsusr · 2024-03-29T21:10:33.399Z · comments (13)

Laying the Foundations for Vision and Multimodal Mechanistic Interpretability & Open Problems
Sonia Joseph (redhat) · 2024-03-13T17:09:17.027Z · comments (13)

[link] Analyzing how SAE features evolve across a forward pass
bensenberner · 2024-11-07T22:07:02.827Z · comments (0)

The Next ChatGPT Moment: AI Avatars
kolmplex (luke-man) · 2024-01-05T20:14:10.074Z · comments (10)

[link] Project ideas: Epistemics
Lukas Finnveden (Lanrian) · 2024-01-05T23:41:23.721Z · comments (4)

How difficult is AI Alignment?
Sammy Martin (SDM) · 2024-09-13T15:47:10.799Z · comments (6)

[link] Why Georgism Lost Its Popularity
Zero Contradictions · 2024-07-20T15:08:41.469Z · comments (50)

Which LessWrong/Alignment topics would you like to be tutored in? [Poll]
Ruby · 2024-09-19T01:35:02.999Z · comments (12)

Minimal Motivation of Natural Latents
johnswentworth · 2024-10-14T22:51:58.125Z · comments (14)

Locating My Eyes (Part 3 of "The Sense of Physical Necessity")
LoganStrohl (BrienneYudkowsky) · 2024-02-29T03:09:25.810Z · comments (4)

2023 LessWrong Community Census, Request for Comments
Screwtape · 2023-11-01T16:32:19.102Z · comments (37)

Understanding Positional Features in Layer 0 SAEs
bilalchughtai (beelal) · 2024-07-29T09:36:40.701Z · comments (0)

Motivation control
Joe Carlsmith (joekc) · 2024-10-30T17:15:50.881Z · comments (7)

[link] cancer rates after gene therapy
bhauth · 2024-10-16T15:32:53.949Z · comments (0)

The need for multi-agent experiments
Martín Soto (martinsq) · 2024-08-01T17:14:16.590Z · comments (3)

[question] Where is the Town Square?
Gretta Duleba (gretta-duleba) · 2024-02-13T03:53:18.205Z · answers+comments (8)

Sci-Fi books micro-reviews
Yair Halberstadt (yair-halberstadt) · 2024-06-24T09:49:28.523Z · comments (27)

[link] An EPUB of Arbital's AI Alignment section
mesaoptimizer · 2023-10-16T19:36:29.109Z · comments (1)

AXRP Episode 25 - Cooperative AI with Caspar Oesterheld
DanielFilan · 2023-10-03T21:50:07.552Z · comments (0)

Childhood and Education Roundup #4
Zvi · 2024-01-30T13:50:06.033Z · comments (10)

[question] Does reducing the amount of RL for a given capability level make AI safer?
Chris_Leong · 2024-05-05T17:04:01.799Z · answers+comments (22)

Win/continue/lose scenarios and execute/replace/audit protocols
Buck · 2024-11-15T15:47:24.868Z · comments (2)

[link] How bad is chlorinated water?
bhauth · 2023-12-13T18:00:12.640Z · comments (18)

Job Listing: Managing Editor / Writer
Gretta Duleba (gretta-duleba) · 2024-02-21T23:41:26.818Z · comments (2)

Why does generalization work?
Martín Soto (martinsq) · 2024-02-20T17:51:10.424Z · comments (16)

New Executive Team & Board — PIBBSS
Nora_Ammann · 2024-07-01T19:30:45.261Z · comments (1)

My intellectual journey to (dis)solve the hard problem of consciousness
Charbel-Raphaël (charbel-raphael-segerie) · 2024-04-06T09:32:41.612Z · comments (41)

Concrete empirical research projects in mechanistic anomaly detection
Erik Jenner (ejenner) · 2024-04-03T23:07:21.502Z · comments (3)

The Case for Predictive Models
Rubi J. Hudson (Rubi) · 2024-04-03T18:22:20.243Z · comments (7)

Ambiguity in Prediction Market Resolution is Still Harmful
aphyer · 2024-07-31T20:32:40.217Z · comments (17)

Incidental polysemanticity
Victor Lecomte (victor-lecomte) · 2023-11-15T04:00:00.000Z · comments (7)

Protocol evaluations: good analogies vs control
Fabien Roger (Fabien) · 2024-02-19T18:00:09.794Z · comments (10)

[link] cold aluminum for medicine
bhauth · 2023-12-16T14:38:03.260Z · comments (4)

Unit economics of LLM APIs
dschwarz · 2024-08-27T16:51:22.692Z · comments (0)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

yitz on Ayn Rand’s model of “living money”; and an upside of burnout

Reminds me of Internal Family Systems, which has a nice amount of research behind it if you want to learn more.

zero-contradictions on Proposal to increase fertility: University parent clubs

This is a great idea. I've brainstormed and compiled a list of additional ideas that could also help raise fertility rates. https://zerocontradictions.net/faqs/overpopulation#boosting-western-fertility

michaeldickens on Announcing turntrout.com, my new digital home

Do you think a 3-state dark mode selector is better than a 1-state (where "auto" is the only state)? My website is 1-state, on the assumption that auto will work for almost everyone and it lets me skip the UI clutter of having a lighting toggle that most people won't use.

Also, I don't know if the site has been updated but it looks to me like turntrout.com's two modes aren't dark and light, they're auto and light. When I set Firefox's appearance to dark or auto, turntrout.com's dark mode appears dark, but when I set Firefox to light, turntrout.com appears light. turntrout.com's light mode appears to be light regardless of my Firefox setting.

justinpombrio on "It's a 10% chance which I did 10 times, so it should be 100%"

However it is true that doing something with a 10% success rate 10 times will net you an average of 1 success.

For the easier to work out case of doing something with a 50% success rate 2 times:

25% chance of 0 successes
50% chance of 1 success
25% chance of 2 successes

Gives an average of 1 success.

Of course this only matters for the sort of thing where 2 successes is better than 1 success:

10% chance of finding a monogamous partner 10 times yields 0.63 monogamous partners in expectation.
10% chance of finding a polyamorous partner 10 times yields 1.00 polyamorous partners in expectation.

boris-kashirin on "It's a 10% chance which I did 10 times, so it should be 100%"

e^3 is ~20, so for large n you get 95% of success by doing 3n attempts.

linch on A Qualitative Case for LTFF: Filling Critical Ecosystem Gaps

Agreed, I was trying to convey something that I think is underrated succinctly, obviously going to miss some nuances.

zach-stein-perlman on A Qualitative Case for LTFF: Filling Critical Ecosystem Gaps

Open Philanthropy's AI safety work tends toward large grants in the high hundreds of thousands or low millions, meaning individuals and organizations with lower funding needs won't be funded by them

This is directionally true but projects seeking less money should still apply to OP if relevant; a substantial minority of recent OP AI safety and GCR capacity building grants are <$100K.

philip_b on "It's a 10% chance which I did 10 times, so it should be 100%"

Nice. I have a suggestion how to improve the article. Put a clearly stated theorem somewhere in the middle, in its own block, like in academic math articles.

myles-h on What are Emotions?

Wow, thank you so much. This is a lens I totally hadn't considered.

You can see in the post how I was confused how evolution played a part in "imbuing" material terminal goals into humans. I was like, "but kinetic sculptures were not in the ancestral environment?"

It sounds like rather than imbuing humans with material goals, it has imbued a process by which humans create their own.

I would still define material goals as simply terminal goals which are not defined by some qualia, but it is fascinating that this is what material goals look like in humans.

This also, as you say, makes it harder to distinguish between emotional and material goals in humans, since our material goals are ultimately emotionally derived. In particular, it makes it difficult to distinguish between an instrumental goal to an emotional terminal goal, and a learned material goal created from reinforced prediction of its expected emotional reward.

E.g. the difference between someone wanting a cookie because it will make them feel good, and someone wanting money as a terminal goal because their brain frequently predicted that money would lead to feeling good.

I still make this distinction between material and emotional goals because this isn't the only way that material goals play out among all agents. For example, my thermostat has simply been directly imbued with the goal of maintaining a temperature. I can also imagine this is how material goals play out in most insects.

Other emotions, like fear, anger, etc. are different. They can be thought of as "tilts"' to our cognitive landscape. Even learning that we're experiencing them is tricky. That's why emotional awareness is a subject to learn about, not just something we're born knowing. We need to learn to "feel the tilt". Elevated heart rate might signal fear, anger, or excitement; noticing it or finding other cues are necessary to understand how we're tilted, and how to correct for it if we want to act rationally. Those sorts of emotions "tilt the landscape" of our cognition by making different thoughts and actions more likely, like thoughts of how someone's actions were unfair or physical attacks when we're angry.

This makes a lot of sense. Yeah I was definitely simplifying all emotions to just their qualia effect, without considering their other physiological effects which define them. So I guess in this post when I say "emotion", I really mean "qualia".

But I'm pretty sure that predicted reward is pretty synonymous with what we call "values".

Just to clarify, are you using "reward" here to also mean "positive (or a lack of negative) qualia". Or is this reinforcement mechanism recursive by which we might learn to value something because of its predicted reward, but that reward is also a learned value.... and so on where the base case is an emotional reward. If so, how deep can it go?

benito on Reformative Hypocrisy, and Paying Close Enough Attention to Selectively Reward It.

Hm, but I note others at the time felt it clear that this would exacerbate the competition (1, 2).