LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

AI #65: I Spy With My AI
Zvi · 2024-05-23T12:40:02.793Z · comments (7)

[link] Cellular reprogramming, pneumatic launch systems, and terraforming Mars: Some things I learned about at Foresight Vision Weekend
jasoncrawford · 2024-01-04T19:33:57.887Z · comments (0)

Tackling Moloch: How YouCongress Offers a Novel Coordination Mechanism
Hector Perez Arenas (hector-perez-arenas) · 2024-05-15T23:13:48.501Z · comments (9)

[link] Quick Thoughts on Scaling Monosemanticity
Joel Burget (joel-burget) · 2024-05-23T16:22:48.035Z · comments (1)

[link] Conversation Visualizer
ethanmorse · 2023-12-31T01:18:01.424Z · comments (4)

Cryonics p(success) estimates are only weakly associated with interest in pursuing cryonics in the LW 2023 Survey
Andy_McKenzie · 2024-02-29T14:47:28.613Z · comments (6)

{Book Summary} The Art of Gathering
Tristan Williams (tristan-williams) · 2024-04-16T10:48:41.528Z · comments (0)

An Affordable CO2 Monitor
Pretentious Penguin (dylan-mahoney) · 2024-03-21T03:06:53.255Z · comments (1)

Evaporation of improvements
Viliam · 2024-06-20T18:34:40.969Z · comments (27)

Employee Incentives Make AGI Lab Pauses More Costly
nikola (nikolaisalreadytaken) · 2023-12-22T05:04:15.598Z · comments (12)

An explanation of evil in an organized world
KatjaGrace · 2024-05-02T05:20:06.240Z · comments (9)

Aggregative principles approximate utilitarian principles
Cleo Nardo (strawberry calm) · 2024-06-12T16:27:22.179Z · comments (3)

Cicadas, Anthropic, and the bilateral alignment problem
kromem · 2024-05-22T11:09:56.469Z · comments (6)

Updates to Open Phil’s career development and transition funding program
abergal · 2023-12-04T18:10:29.394Z · comments (0)

[link] AI Impacts 2023 Expert Survey on Progress in AI
habryka (habryka4) · 2024-01-05T19:42:17.226Z · comments (1)

DIY RLHF: A simple implementation for hands on experience
Mike Vaiana (mike-vaiana) · 2024-07-10T12:07:03.047Z · comments (0)

[link] If-Then Commitments for AI Risk Reduction [by Holden Karnofsky]
habryka (habryka4) · 2024-09-13T19:38:53.194Z · comments (0)

Can quantised autoencoders find and interpret circuits in language models?
charlieoneill (kingchucky211) · 2024-03-24T20:05:50.125Z · comments (4)

Ackshually, many worlds is wrong
tailcalled · 2024-04-11T20:23:59.416Z · comments (42)

Collection (Part 6 of "The Sense Of Physical Necessity")
LoganStrohl (BrienneYudkowsky) · 2024-03-14T21:37:00.160Z · comments (0)

[link] AI Safety at the Frontier: Paper Highlights, August '24
gasteigerjo · 2024-09-03T19:17:24.850Z · comments (0)

AI #64: Feel the Mundane Utility
Zvi · 2024-05-16T15:20:02.956Z · comments (11)

Childhood and Education Roundup #6: College Edition
Zvi · 2024-06-26T11:40:03.990Z · comments (8)

Monthly Roundup #19: June 2024
Zvi · 2024-06-25T12:00:03.333Z · comments (9)

[link] ML Safety Research Advice - GabeM
Gabe M (gabe-mukobi) · 2024-07-23T01:45:42.288Z · comments (2)

[link] New blog: Expedition to the Far Lands
Connor Leahy (NPCollapse) · 2024-08-17T11:07:48.537Z · comments (3)

Reading More Each Day: A Simple $35 Tool
aysajan · 2024-07-24T13:54:04.290Z · comments (2)

On the 2nd CWT with Jonathan Haidt
Zvi · 2024-04-05T17:30:05.223Z · comments (3)

The economy is mostly newbs (strat predictions)
lemonhope (lcmgcd) · 2024-02-01T19:15:49.420Z · comments (6)

Deceptive agents can collude to hide dangerous features in SAEs
Simon Lermen (dalasnoin) · 2024-07-15T17:07:33.283Z · comments (0)

[link] AISN #30: Investments in Compute and Military AI Plus, Japan and Singapore’s National AI Safety Institutes
aogara (Aidan O'Gara) · 2024-01-24T19:38:33.461Z · comments (1)

Scientific Notation Options
jefftk (jkaufman) · 2024-05-18T15:10:02.181Z · comments (13)

Response to Dileep George: AGI safety warrants planning ahead
Steven Byrnes (steve2152) · 2024-07-08T15:27:07.402Z · comments (7)

Incentive Learning vs Dead Sea Salt Experiment
Steven Byrnes (steve2152) · 2024-06-25T17:49:01.488Z · comments (1)

flowing like water; hard like stone
lsusr · 2024-02-20T03:20:46.531Z · comments (4)

Superintelligence Can't Solve the Problem of Deciding What You'll Do
Vladimir_Nesov · 2024-09-15T21:03:28.077Z · comments (11)

[link] Solving alignment isn't enough for a flourishing future
mic (michael-chen) · 2024-02-02T18:23:00.643Z · comments (0)

Fifteen Lawsuits against OpenAI
Remmelt (remmelt-ellen) · 2024-03-09T12:22:09.715Z · comments (4)

[question] Supposing the 1bit LLM paper pans out
O O (o-o) · 2024-02-29T05:31:24.158Z · answers+comments (11)

Cheap Whiteboards!
Johannes C. Mayer (johannes-c-mayer) · 2024-08-08T13:52:59.627Z · comments (2)

Investigating Sensitive Directions in GPT-2: An Improved Baseline and Comparative Analysis of SAEs
Daniel Lee (daniel-lee) · 2024-09-06T02:28:41.954Z · comments (0)

A Strange ACH Corner Case
jefftk (jkaufman) · 2024-02-10T03:00:05.930Z · comments (2)

[link] David Burns Thinks Psychotherapy Is a Learnable Skill. Git Gud.
Morpheus · 2024-01-27T13:21:05.068Z · comments (20)

[link] Video Intro to Guaranteed Safe AI
Mike Vaiana (mike-vaiana) · 2024-07-11T17:53:47.630Z · comments (0)

Weak vs Quantitative Extinction-level Goodhart's Law
VojtaKovarik · 2024-02-21T17:38:15.375Z · comments (1)

Bay Winter Solstice 2024: song leading auditions
tcheasdfjkl · 2024-11-10T23:59:08.199Z · comments (0)

[link] Found Paper: "FDT in an evolutionary environment"
the gears to ascension (lahwran) · 2023-11-27T05:27:50.709Z · comments (47)

Appraising aggregativism and utilitarianism
Cleo Nardo (strawberry calm) · 2024-06-21T23:10:37.014Z · comments (10)

[link] align your latent spaces
bhauth · 2023-12-24T16:30:09.138Z · comments (8)

[link] Goodhart's Law Example: Training Verifiers to Solve Math Word Problems
Chris_Leong · 2023-11-25T00:53:26.841Z · comments (2)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

ape-in-the-coat on Quantum Immortality: A Perspective if AI Doomers are Probably Right

You are right, and it's a serious counterargument to consider.
You are also right that the Anthropic Trilemma and Magic by Forgetting do not work with path-dependent identity.

Okay, glad we are on the same page here.

However, we can almost recreate the magic machine from the Anthropic Trilemma using path-based identity

I'm not sure I understand your example and how it recreates the magic. Let me try to describe to it with my own words, and then correct me if I got something wrong.

You are put to sleep. Then you are splitted into two people. Then, on random, one of them is put into red room and one into green room. Let's say that person 1 is in red room and 2 in green room. Then the person 2 is splitted into two people: 21 and 22. Both of them are keept in green rooms. Then everyone is awaken. What should be your credence to awake in a red room?

Here there are three possibilities: 50% to be 1 in a red room and 25% chance to be either 21 or 22 in green rooms. No matter how much a person in a green room is split, the total probability for greenness stays the same. All is quite normal and there is no magic.

Now let's add a twist.

Instead of putting both 21 and 22 in green rooms, one of them - let it be 21 - is put in a red room.

In this situation, total probability for red room is P(1) + P(21) = 75%. And if we split the 2 more and put more of its parts in red rooms we get highter and highter probability to be in red room. Therefore we get magical ability to manipulate probability.

Am I getting you correctly?

I do not see anything problematic with such "manipulation of probability". We do not change our estimate just because more people with the same experience are created. We change the estemate because different fraction of people get different experience. This is no more magical than putting both 1 and 2 into red rooms and noticing that suddenly the probability for being in red room reached 100%, compared to the initial formulation where it was mere 50%. Of course it did! That's completely lawful behaviour of probability theoretic reasoning.

Notice that we can't actually recreate the anthropic trilemma and be certain to win lottery this way. Because we can't move people between branches. Therefore everything adds up to normality.

Also, path-dependent identity opens the door to back-causation and premonition, because if we normalize outputs of some black box where paths are mixed, similar to the magic machine discussed above

We just need to restrict the mixing of the paths, which is the restriction of QM anyway. Or maybe I'm missing something? Could you give me an example with such backwards causality? Because as far as I see, everything is quite straightforward.

The main problem of path-dependent identity is that we assume the existence of a "global hidden variable" for any observer. It is hidden as it can't be measured by an outside viewer and only represents the subjective chances of the observer to be one copy and not another. And it is global as it depends on the observer's path, not their current state. It therefore contradicts the view that mind is equal to a Turing computer (functionalism) and requires the existence of some identity carrier which moves through paths (qualia, quantum continuity, or soul).

Seems like we are just confused about this "identity" thingy and therefore don't know how to correctly reason about it. In such situations we are supposed to

Acknowledge that we are are confused
Stop speculating on top of our confusion and jumping to conclusions based on it
Outline the possible options to the best of our understanding and keep an open mind until we manage to resolve the confusion

It's already clear that "mind" and "identity" are not the same thing. We can talk about identities of things that do not possess a mind, and identities are unique while, there can exist copies of the same mind.So minds can very well be Turing computers, but identities are something else, or even not a thing at all.

Our intuitive desire to drag in consciousness/qualia/soul also appears completely unhelpful after thinking about it for the first five minutes. Non-conscious minds can do the same probability theoretic reasonings as conscious ones. Nothing changes if 1, 21 and 22 from the problem above are not humans but programs executed on different computers.

Whatever extra variable we need it seems to be something that a Laplace's demon would know. It's a knowledge about whether a mind was split into n instances simultaneously or through multiple steps. It indeed means that something else except the immediate state of the mind is important for "indentity" considerations, but this something can very well be completely physical - just the past history of causes and effects that led to this state of the mind.

yitz on Ayn Rand’s model of “living money”; and an upside of burnout

Reminds me of Internal Family Systems, which has a nice amount of research behind it if you want to learn more.

zero-contradictions on Proposal to increase fertility: University parent clubs

This is a great idea. I've brainstormed and compiled a list of additional ideas that could also help raise fertility rates. https://zerocontradictions.net/faqs/overpopulation#boosting-western-fertility

michaeldickens on Announcing turntrout.com, my new digital home

Do you think a 3-state dark mode selector is better than a 1-state (where "auto" is the only state)? My website is 1-state, on the assumption that auto will work for almost everyone and it lets me skip the UI clutter of having a lighting toggle that most people won't use.

Also, I don't know if the site has been updated but it looks to me like turntrout.com's two modes aren't dark and light, they're auto and light. When I set Firefox's appearance to dark or auto, turntrout.com's dark mode appears dark, but when I set Firefox to light, turntrout.com appears light. turntrout.com's light mode appears to be light regardless of my Firefox setting.

justinpombrio on "It's a 10% chance which I did 10 times, so it should be 100%"

However it is true that doing something with a 10% success rate 10 times will net you an average of 1 success.

For the easier to work out case of doing something with a 50% success rate 2 times:

25% chance of 0 successes
50% chance of 1 success
25% chance of 2 successes

Gives an average of 1 success.

Of course this only matters for the sort of thing where 2 successes is better than 1 success:

10% chance of finding a monogamous partner 10 times yields 0.63 monogamous partners in expectation.
10% chance of finding a polyamorous partner 10 times yields 1.00 polyamorous partners in expectation.

boris-kashirin on "It's a 10% chance which I did 10 times, so it should be 100%"

e^3 is ~20, so for large n you get 95% of success by doing 3n attempts.

linch on A Qualitative Case for LTFF: Filling Critical Ecosystem Gaps

Agreed, I was trying to convey something that I think is underrated succinctly, obviously going to miss some nuances.

zach-stein-perlman on A Qualitative Case for LTFF: Filling Critical Ecosystem Gaps

Open Philanthropy's AI safety work tends toward large grants in the high hundreds of thousands or low millions, meaning individuals and organizations with lower funding needs won't be funded by them

This is directionally true but projects seeking less money should still apply to OP if relevant; a substantial minority of recent OP AI safety and GCR capacity building grants are <$100K.

philip_b on "It's a 10% chance which I did 10 times, so it should be 100%"

Nice. I have a suggestion how to improve the article. Put a clearly stated theorem somewhere in the middle, in its own block, like in academic math articles.

myles-h on What are Emotions?

Wow, thank you so much. This is a lens I totally hadn't considered.

You can see in the post how I was confused how evolution played a part in "imbuing" material terminal goals into humans. I was like, "but kinetic sculptures were not in the ancestral environment?"

It sounds like rather than imbuing humans with material goals, it has imbued a process by which humans create their own.

I would still define material goals as simply terminal goals which are not defined by some qualia, but it is fascinating that this is what material goals look like in humans.

This also, as you say, makes it harder to distinguish between emotional and material goals in humans, since our material goals are ultimately emotionally derived. In particular, it makes it difficult to distinguish between an instrumental goal to an emotional terminal goal, and a learned material goal created from reinforced prediction of its expected emotional reward.

E.g. the difference between someone wanting a cookie because it will make them feel good, and someone wanting money as a terminal goal because their brain frequently predicted that money would lead to feeling good.

I still make this distinction between material and emotional goals because this isn't the only way that material goals play out among all agents. For example, my thermostat has simply been directly imbued with the goal of maintaining a temperature. I can also imagine this is how material goals play out in most insects.

Other emotions, like fear, anger, etc. are different. They can be thought of as "tilts"' to our cognitive landscape. Even learning that we're experiencing them is tricky. That's why emotional awareness is a subject to learn about, not just something we're born knowing. We need to learn to "feel the tilt". Elevated heart rate might signal fear, anger, or excitement; noticing it or finding other cues are necessary to understand how we're tilted, and how to correct for it if we want to act rationally. Those sorts of emotions "tilt the landscape" of our cognition by making different thoughts and actions more likely, like thoughts of how someone's actions were unfair or physical attacks when we're angry.

This makes a lot of sense. Yeah I was definitely simplifying all emotions to just their qualia effect, without considering their other physiological effects which define them. So I guess in this post when I say "emotion", I really mean "qualia".

But I'm pretty sure that predicted reward is pretty synonymous with what we call "values".

Just to clarify, are you using "reward" here to also mean "positive (or a lack of negative) qualia". Or is this reinforcement mechanism recursive by which we might learn to value something because of its predicted reward, but that reward is also a learned value.... and so on where the base case is an emotional reward. If so, how deep can it go?