LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Simple distribution approximation: When sampled 100 times, can language models yield 80% A and 20% B?
Teun van der Weij (teun-van-der-weij) · 2024-01-29T00:24:27.706Z · comments (5)

Interpretability as Compression: Reconsidering SAE Explanations of Neural Activations with MDL-SAEs
Kola Ayonrinde (kola-ayonrinde) · 2024-08-23T18:52:31.019Z · comments (5)

[link] Legalize butanol?
bhauth · 2023-12-20T14:24:33.849Z · comments (20)

[link] Understanding Gödel’s completeness theorem
jessicata (jessica.liu.taylor) · 2024-05-27T18:55:02.079Z · comments (0)

Stitching SAEs of different sizes
Bart Bussmann (Stuckwork) · 2024-07-13T17:19:20.506Z · comments (12)

[link] [Paper] Language Models Don't Learn the Physical Manifestation of Language
Bruce W. Lee (bruce-lee) · 2024-02-22T18:52:32.237Z · comments (23)

Medical Roundup #3
Zvi · 2024-07-09T13:10:06.862Z · comments (4)

Anthropic rewrote its RSP
Zach Stein-Perlman · 2024-10-15T14:25:12.518Z · comments (17)

Why I quit effective altruism, and why Timothy Telleen-Lawton is staying (for now)
Elizabeth (pktechgirl) · 2024-10-22T18:20:01.194Z · comments (3)

Forget Everything (Statistical Mechanics Part 1)
J Bostock (Jemist) · 2024-04-22T13:33:35.446Z · comments (6)

[link] Linear infra-Bayesian Bandits
Vanessa Kosoy (vanessa-kosoy) · 2024-05-10T06:41:09.206Z · comments (5)

[Interim research report] Evaluating the Goal-Directedness of Language Models
Rauno Arike (rauno-arike) · 2024-07-18T18:19:04.260Z · comments (4)

Prepsgiving, A Convergently Instrumental Human Practice
JenniferRM · 2023-11-23T17:24:56.784Z · comments (0)

Apply to the PIBBSS Summer Research Fellowship
Nora_Ammann · 2024-01-12T04:06:58.328Z · comments (1)

Logical Line-Of-Sight Makes Games Sequential or Loopy
StrivingForLegibility · 2024-01-19T04:05:44.782Z · comments (0)

Instrumental deception and manipulation in LLMs - a case study
Olli Järviniemi (jarviniemi) · 2024-02-24T02:07:01.769Z · comments (13)

[link] Conflict in Posthuman Literature
Martín Soto (martinsq) · 2024-04-06T22:26:04.051Z · comments (1)

Individually incentivized safe Pareto improvements in open-source bargaining
Nicolas Macé (NicolasMace) · 2024-07-17T18:26:43.619Z · comments (2)

How To Do Patching Fast
Joseph Miller (Josephm) · 2024-05-11T20:13:52.424Z · comments (6)

Stop talking about p(doom)
Isaac King (KingSupernova) · 2024-01-01T10:57:28.636Z · comments (22)

[link] Things You're Allowed to Do: At the Dentist
rbinnn · 2024-01-28T18:39:33.584Z · comments (16)

[question] What progress have we made on automated auditing?
LawrenceC (LawChan) · 2024-07-06T01:49:43.714Z · answers+comments (1)

AI #70: A Beautiful Sonnet
Zvi · 2024-06-27T14:40:08.087Z · comments (0)

Monthly Roundup #14: January 2024
Zvi · 2024-01-24T12:50:09.231Z · comments (22)

[link] An AI Manhattan Project is Not Inevitable
Maxwell Tabarrok (maxwell-tabarrok) · 2024-07-06T16:42:35.920Z · comments (25)

[link] Elon files grave charges against OpenAI
mako yass (MakoYass) · 2024-03-01T17:42:13.963Z · comments (10)

[link] [Linkpost] George Mack's Razors
trevor (TrevorWiesinger) · 2023-11-27T17:53:45.065Z · comments (8)

Are we so good to simulate?
KatjaGrace · 2024-03-04T05:20:03.535Z · comments (24)

Evaluating Sparse Autoencoders with Board Game Models
Adam Karvonen (karvonenadam) · 2024-08-02T19:50:21.525Z · comments (1)

China-AI forecasts
[deleted] · 2024-02-25T16:49:33.652Z · comments (29)

D&D.Sci: Whom Shall You Call?
abstractapplic · 2024-07-05T20:53:37.010Z · comments (6)

Inducing Unprompted Misalignment in LLMs
Sam Svenningsen (sven) · 2024-04-19T20:00:58.067Z · comments (6)

AI #48: The Talk of Davos
Zvi · 2024-01-25T16:20:26.625Z · comments (9)

[link] Simple Kelly betting in prediction markets
jessicata (jessica.liu.taylor) · 2024-03-06T18:59:18.243Z · comments (3)

[link] AISafety.info: What is the "natural abstractions hypothesis"?
Algon · 2024-10-05T12:31:14.195Z · comments (2)

Book Review: On the Edge: The Business
Zvi · 2024-09-25T12:20:06.230Z · comments (0)

Toy Models of Feature Absorption in SAEs
chanind · 2024-10-07T09:56:53.609Z · comments (7)

Australian AI Safety Forum 2024
Liam Carroll (liam-carroll) · 2024-09-27T00:40:11.451Z · comments (0)

D&D Sci Coliseum: Arena of Data
aphyer · 2024-10-18T22:02:54.305Z · comments (11)

[link] Generative ML in chemistry is bottlenecked by synthesis
Abhishaike Mahajan (abhishaike-mahajan) · 2024-09-16T16:31:34.801Z · comments (2)

Compelling Villains and Coherent Values
Cole Wyeth (Amyr) · 2024-10-06T19:53:47.891Z · comments (4)

0.202 Bits of Evidence In Favor of Futarchy
niplav · 2024-09-29T21:57:59.896Z · comments (0)

[link] The consistent guessing problem is easier than the halting problem
jessicata (jessica.liu.taylor) · 2024-05-20T04:02:03.865Z · comments (5)

International Scientific Report on the Safety of Advanced AI: Key Information
Aryeh Englander (alenglander) · 2024-05-18T01:45:10.194Z · comments (0)

Mud and Despair (Part 4 of "The Sense Of Physical Necessity")
LoganStrohl (BrienneYudkowsky) · 2024-03-07T00:14:23.975Z · comments (0)

[link] Win Friends and Influence People Ch. 2: The Bombshell
gull · 2024-01-28T21:40:47.986Z · comments (13)

[link] Increasing IQ is trivial
George3d6 · 2024-03-01T22:43:32.037Z · comments (59)

Text Posts from the Kids Group: 2021
jefftk (jkaufman) · 2023-11-09T17:50:25.782Z · comments (1)

Natural abstractions are observer-dependent: a conversation with John Wentworth
Martín Soto (martinsq) · 2024-02-12T17:28:38.889Z · comments (13)

You're a Space Wizard, Luke
lsusr · 2024-08-18T05:35:39.238Z · comments (6)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

yoav-ravid on Could randomly choosing people to serve as representatives lead to better government?

I recently read Terry Bouricious' book about Sortition and I highly recommend it (It's completely free on his substack)

thomascederborg on The ELYSIUM Proposal - Extrapolated voLitions Yielding Separate Individualized Utopias for Mankind

I thought that your Cosmic Block proposal would only block information regarding things going on inside a given Utopia. I did not think that the Cosmic Block would subject every person to forced memory deletion. As far as I can tell, this would mean removing a large portion of all memories (details below). I think that memory deletion on the implied scale would seriously complicate attempts to define an extrapolation dynamic. It also does not seem to me that it would actually patch the security hole illustrated by the thought experiment in my original comment (details below).

The first section argues that (unless Bob's basic moral framework has been dramatically changed by the memory deletion) no level of memory deletion will prevent BPA from wanting to find and hurt Steve. In brief: BPA will still be subject to the same moral imperative to find and hurt any existing heretics (including Steve).

The second section argues that BPA is likely to find Steve. In brief: BPA is a clever AI and the memory deletion is a human constructed barrier (the Advocates are extrapolations of people that has already been subjected to these memory wipes. So Advocates cannot be involved when negotiating the rules governing these memory wipes). BPA would still have access to a lot of different information sources that it can use to find Steve.

The third section argues that if BPA finds Steve, then BPA would be able to hurt Steve. In brief: creating OldSteve is still not prevented by any rule or constraint that you have mentioned so far.

The fourth section argues that the side effects of memory deletion would be severe. In brief: memories of every conversation about any deleted person would also be deleted. Besides all direct memories involving any deleted person, many indirect memories would also be deleted. This would seriously complicate extrapolation. (Extrapolation is already a very tricky definitional problem. And this definitional problem cannot be delegated to extrapolated Advocates, since they are the result of the extrapolation dynamic being defined).

The last section deals with your proposed resource destruction mechanism. In brief: in one thought experiment (that did not involve threats) almost half the population would very strongly prefer to destroy ELYSIUM. This disproved your claim that such a situation would not arise. Also: no thought experiment involved anyone trying to gain resources. And no thought experiment involved anyone issuing any form of threat that would not be followed up on. People burned resources that they did not need. They did this to either hurt people directly. Or to issue genuinely credible threats.

BPA will still want to find and hurt heretics

It would not be surprising if a comprehensive, human defined, memory deletion operation would completely re write someone's basic moral framework. But I'm guessing that your proposed memory deletion is not supposed to be done in a way that changes a persons basic moral framework. So let's reason from the assumption that it does not.

This means that Bob still considers BPA to be bound by a non negotiable moral imperative. So BPA still wants to find and punish any heretic that might exist.

In other words: the Cosmic Block is not the type of mechanism that might prevent BPA from wanting to find and hurt Steve. It is instead a practical obstacle that BPA needs to overcome (which is something very different). It is a human constructed practical barrier, that is supposed to protect Steve from a clever AI that wants to find and hurt Steve.

BPA will likely be able to find Steve

Unless the details of your proposed Cosmic Block are constructed by an AI that prevents All Bad Things, these rules must come from somewhere else. AI assisted negotiations cannot be done by the Advocates. Advocates are the result of extrapolating memory wiped people (otherwise the whole point of the Cosmic Bloc is lost). So the Advocates cannot be involved in defining the memory wipe rules.

In other words: unless the memory wipe rules are negotiated by a completely separate set of (previously unmentioned) AIs, the memory wipe rules will be human defined.

This means that a human constructed barrier must hold against a clever AI trying to get around it. Even if we were to know that a human defined barrier has no humanly-findable security holes, this does not mean that it will actually hold against a clever AI. A clever AI can find security holes that are not humanly-findable.

The specific situation that BPA will find itself in does not seem to be described in sufficient detail for it to be possible to outline a specific path along which BPA finds Steve. But from the currently specified rules, we do know that BPA has access to several ways of gathering information about Steve.

People can pool resources (as described in your original proposal). So Advocates can presumably ask other Advocates about potential partners for cohabitation. Consider the case where BPA is negotiating with other Advocates regarding who will be included in a potential shared environment. This decision will presumably involve information about potential candidates. Whether or not a given person is accepted, would presumably depend on detailed personal information.

Advocates can also engage in mutual resource destruction to prevent computations happening within other Utopias. You describe this mechanism as involving negotiations between Advocates, regarding computations happening within other people's Utopias. Such negotiations would primarily be between the Advocates of people that have very different values. This is another potential information source about Steve.

Steve would also have left a lot of effects on the world, besides effects on peoples memories. Steve might for example have had a direct impact on what type of person someone else has turned into. Deleting this impact would be even more dramatic than deleting memories.

Steve might have also have had a significant impact on various group dynamics (for example: his family, the friend groups that he has been a part of, different sets of coworkers and classmates, online communities, etc). Unless all memories regarding the general group dynamics of every group that Steve has been a part of is deleted, Steve's life would have left behind many visible effects.

The situation is thus that a clever AI is trying to find and hurt Steve. There are many different types of information sources that can be combined in clever ways to find Steve. The rules of all barriers between this AI and Steve are human constructed. Even with perfect enforcement of all barriers, this still sounds like a scenario where BPA will find Steve (for the same reason that a clever AI is likely to find its way out of a human constructed box, or around a human constructed Membrane [LW · GW]).

There is still nothing protecting Steve from BPA

If BPA locates Steve, then there is nothing preventing BPA from using OldSteve to hurt Steve. What is happening to OldSteve is still not prevented by any currently specified rule. The suffering of OldSteve is entirely caused by internal dynamics. OldSteve never lacks any form of information. And the harm inflicted on OldSteve is not in any sense marginal.

I do not see any strong connections between the OldSteve thought experiment and your Scott Alexander quote (which is concerned with the question of what options and information should be provided by a government run by humans. To children raised by other humans). More generally: scenarios that include a clever AI that is specifically trying to hurt someone, has a lot of unique properties (important properties that are not present in scenarios that lack such an AI). I think that these scenarios are dangerous. And I think that they should be avoided (as opposed to first created and then mitigated). (Avoiding such scenarios is a necessary, but definitely not sufficient, feature of an alignment target).

Memory wipes would complicate extrapolation

All deleted memories must be so thoroughly wiped that a clever AI will be unable to reconstruct them (otherwise the whole point of the Cosmic Block is negated). Deleting all memories of a single important negative interpersonal relationship would be a huge modification. Even just deleting all memories of one famous person that served as a role model would be significant.

Thoroughly deleting your memory of a person, would also impact your memory of every conversation that you have ever had about this person. Including conversations with people that are not deleted. Most long term social relationships involves a lot of discussions of other people (one person describing past experiences to the other, discussions of people that both know personally, arguments over politicians or celebrities, etc, etc). Thus, the memory deletion would significantly alter the memories of essentially all significant social relationships. This is not a minor thing to do to a person. (That every person would be subjected to this is not obviously implied by the text in The ELYSIUM Proposal.)

In other words: even memories of non deleted people would be severely modified. For example: every discussion or argument about a deleted person would be deleted. Two people (that do not delete each other) might suddenly have no idea why they almost cut all contact a few years ago, and why their interactions has been so different for the last few years. Either their Advocates can reconstruct the relevant information (in which case the deletion does not serve its purpose). Or their Advocates must try to extrapolate them while lacking a lot of information.

Getting the definitions involved in extrapolation right, seems like it will be very difficult even under ordinary circumstances. Wide ranging and very thorough memory deletion would presumably make extrapolation even more tricky. This is a major issue.

Your proposed resource destruction mechanism

No one in any of my thought experiments was trying to get more resources. The 55 percent majority (and the group of 10 people) have a lot of resources that they do not care much about. They want to create some form of existence for themselves. This only takes a fraction of available resources to set up. They can then burn the rest of their resources on actions within the resource destruction mechanism. They either burn these resources to directly hurt people. Or they risk these resources by making threats that are completely credible. In the thought experiments where someone does issue a threat, the threat is issued because: a person giving in > burning resources to hurt someone who refuses > leaving someone that refuses alone. They are perfectly ok with an outcome where resources are spent on hurting someone that refuses to comply (they are not self modifying as a negotiation strategy. They just think that this is a perfectly ok outcome).

Preventing this type of threats would be difficult because (i): negotiations are allowed, and (ii): in any scenario where threats are prevented, the threatened action would simply be taken (for non strategic reasons). There is no difference in behaviour between scenarios where threats are prevented, and scenarios where threats are ignored.

The thought experiment where a majority burns resources to hurt a minority was a simple example scenario where almost half of the population would very strongly prefer to destroy ELYSIUM (or strongly prefer that ELYSIUM was never created). It was a response to your claim that your resource destruction mechanisms would prevent such a scenario. This thought experiment did not involve any form of threat or negotiation.

Let's call a rule that prevents the majority from hurting the minority a Minority Protection Rule (MPR). There are at least two problems with your claim that a pre-AI majority would prevent the creation of a version of ELYSIUM that has an MPR.

First: without an added MPR, the post-AI majority is able to hurt the minority without giving up anything that they care about (they burn resources they don't need). So there is no reason to think that an extrapolated post-AI majority would want to try to prevent the creation of a version of ELYSIUM with an MPR. They would prefer the case without an MPR. This does not imply that they care enough to try to prevent the creation of a version of ELYSIUM with an MPR. Doing so would presumably be very risky, and they don't gain anything that they care much about. When hurting the minority does not cost them anything that they care about, they do it. That does not imply that this is an important issue for the majority.

More importantly however: you are conflating, (i): a set of un-extrapolated and un-coordinated people living in a pre-AI world, with (ii): a set of clever AI Advocates representing these same people, operating in a post-AI world. There is nothing unexpected about humans opposing / supporting an AI that would be good / bad for them (from the perspective of their extrapolated Advocates). That is the whole point of having extrapolated Advocates.

cstinesublime on leogao's Shortform

Perhaps I misunderstand your use of the phrase "intentionally ignorant" but I believe many cases of people who are seen to have acted with "integrity" are people who have been hyperaware and well informed of what normal social conventions are in a given environment and made deliberate choice not to adhere to them, not ignoring said conventions out of a lack of interest.

I also am not sure what you mean by "weird". I assume you mean any behavior which is not the normal convention of any randomly selected cohesive group of people, from a family, to a local soccer club, to a informal but tight knit circle of friends, to a department of a large company. Have I got that right?

My idea of 'weird' tends to involve the stereotypical artists and creatives I associate with, which is, within those circles not weird at all but normal. But I'm meta-aware that might be a weird take.

green_leaf on The Personal Implications of AGI Realism

(Thanks to the Hayflick limit, only some lines can go on indefinitely.)

gordon-seidoh-worley on [Intuitive self-models] 6. Awakening / Enlightenment / PNSE

As of late July last year, "I" am in PNSE. A few comments.

First, no major errors or concerns when reading the post. I might have missed something, but nothing triggered the "this is misunderstanding what PNSE is fundamentally like" alarm.

Second, there's a lot of ways PNSE is explained. I like this short version: "I am me". That is, "I", the subject of experience, no longer experiences itself as subject, but rather as object, i.e. "me". It's like having a third-person experience of the self. I also like to describe it as thought becoming a sense, like vision or hearing, because "I" no longer do the thinking; instead this person does the thinking to me.

Third, not everyone describes it this way, but in Zen we call the transition into PNSE the Great Death because it literally feels like dying. It's not dissimilar from the ego death people experience on drugs like LSD, but ego "death" is better described as ego "sleep" because it comes back and, after it's happened once, the mind knows the ego is going to come back, whereas in the Great Death the sense of separate self is gone and not coming back. All that said, many with PNSE don't experience a violent transition like this, so the Great Death or something like it may be a contingent feature of some paths to PNSE and not others.

Fourth, I don't remember if the paper discusses this, and this is controversial among some Buddhist traditions, but PNSE doesn't mean the mind is totally liberated from belief in a separate self. You said the homunculus concept lies dormant, but I'd say it does more than that. The mind is filled with many beliefs that presupposed the existence of the homunculus, and even if the homunculus is no longer part of experiences of the world, it's still baked into habits of behavior, and it takes significant additional work once in PNSE to learn new habits to replace the old ones that don't have the homunculus baked into them. Very few people ever become free of all of them, and maybe literally no one does as long as they continue to live.

Fifth and finally, PNSE is great, I'm glad it's how I am now. It's also fine not to be in it, because even if you believe you have a homunculus, in an absolute sense you already don't, you're just confused about how the world works, and that's okay, we're all confused. PNSE is also confused, but in different ways, and with fewer layers of confusion. So if you read this post and are now excited to try for PNSE, great, do it, but be careful. Lots of people Goodhart on what they think PNSE is because they try to hard to get it. If PNSE doesn't sneak up on you, then be extra suspect of Goodharting! (Actually, just also be suspicious that you've Goodharted yourself!)

milanrosko on BIG-Bench Canary Contamination in GPT-4

Question: How does this idea guarantee that the contamination did not happen on purpose otherwise or accidentally through articles like this ? (Not speaking for the companies since I am quite sure that they don't care... Just a practical consideration.)

nutrition-capsule on Overcoming Bias Anthology

I wouldn't equate Robin's perspectives on culture with reactionary movements or conservatism. If anything, he seems quite open to radical transformations of society (e.g. futarchy to replace parlamentarism, bounty systems and vouching to replace policing, private insurance policies to replace welfare policies etc.).

Whereas (neo-)reactionary / conservative thought simply often intends to return some previous status quo, Robin does not confess to representing such views and has not proposed such solutions. In fact, as far as I'm aware he hasn't proposed any solutions at all as of yet.

political-therapy on leogao's Shortform

What do you believe, then, is the most effective way of achieving a goal?

lsusr on [Intuitive self-models] 6. Awakening / Enlightenment / PNSE

I think the answer is: the homunculus concept has a special property of being intrinsically attention-grabbing…. The homunculus is thus impossible to ignore—if the homunculus concept gets activated at all, it jumps to center stage in our minds.

I don't fully understand this bit. I feel like I'm reading a mathematical proof where the author leaves out steps that are trivial to the author, but not to me.

dentosal on TurnTrout's shortform feed

I'd go a step beyond this: merely following incentives is amoral. It's the default. In a sense, moral philosophy discusses when and how you should go against the incentives. Superhero Bias [LW · GW] resonates with this idea, but from a different perspective.