LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

[link] Twitter thread on AI takeover scenarios
Richard_Ngo (ricraz) · 2024-07-31T00:24:33.866Z · comments (0)

Effectively Handling Disagreements - Introducing a New Workshop
Camille Berger (Camille Berger) · 2024-04-15T16:33:50.339Z · comments (2)

A New Class of Glitch Tokens - BPE Subtoken Artifacts (BSA)
Lao Mein (derpherpize) · 2024-09-20T13:13:26.181Z · comments (7)

Your LLM Judge may be biased
Henry Papadatos (henry) · 2024-03-29T16:39:22.534Z · comments (9)

The Laws of Large Numbers
Dmitry Vaintrob (dmitry-vaintrob) · 2025-01-04T11:54:16.967Z · comments (9)

Building Big Science from the Bottom-Up: A Fractal Approach to AI Safety
Lauren Greenspan (LaurenGreenspan) · 2025-01-07T03:08:51.447Z · comments (2)

Intent alignment as a stepping-stone to value alignment
Seth Herd · 2024-11-05T20:43:24.950Z · comments (4)

[link] [Paper] Hidden in Plain Text: Emergence and Mitigation of Steganographic Collusion in LLMs
Yohan Mathew (ymath) · 2024-09-25T14:52:48.263Z · comments (2)

An anti-inductive sequence
Viliam · 2024-08-14T12:28:54.226Z · comments (10)

Debate: Is it ethical to work at AI capabilities companies?
Ben Pace (Benito) · 2024-08-14T00:18:38.846Z · comments (21)

Searching for phenomenal consciousness in LLMs: Perceptual reality monitoring and introspective confidence
EuanMcLean (euanmclean) · 2024-10-29T12:16:18.448Z · comments (8)

Childhood and Education Roundup #5
Zvi · 2024-04-17T13:00:03.015Z · comments (4)

The Evolution of Humans Was Net-Negative for Human Values
Zack_M_Davis · 2024-04-01T16:01:10.037Z · comments (1)

[question] What are your cruxes for imprecise probabilities / decision rules?
Anthony DiGiovanni (antimonyanthony) · 2024-07-31T15:42:27.057Z · answers+comments (33)

Is the Power Grid Sustainable?
jefftk (jkaufman) · 2024-10-26T02:30:06.612Z · comments (38)

Closeness To the Issue (Part 5 of "The Sense Of Physical Necessity")
LoganStrohl (BrienneYudkowsky) · 2024-03-09T00:36:47.388Z · comments (0)

[link] Big tech transitions are slow (with implications for AI)
jasoncrawford · 2024-10-24T14:25:06.873Z · comments (16)

My disagreements with "AGI ruin: A List of Lethalities"
Noosphere89 (sharmake-farah) · 2024-09-15T17:22:18.367Z · comments (46)

Finding the Wisdom to Build Safe AI
Gordon Seidoh Worley (gworley) · 2024-07-04T19:04:16.089Z · comments (10)

On Dwarkesh’s 3rd Podcast With Tyler Cowen
Zvi · 2024-02-02T19:30:05.974Z · comments (9)

But Where do the Variables of my Causal Model come from?
Dalcy (Darcy) · 2024-08-09T22:07:57.395Z · comments (1)

[link] UC Berkeley course on LLMs and ML Safety
Dan H (dan-hendrycks) · 2024-07-09T15:40:00.920Z · comments (1)

AI companies' commitments
Zach Stein-Perlman · 2024-05-29T11:00:31.339Z · comments (0)

AI #47: Meet the New Year
Zvi · 2024-01-13T16:20:10.519Z · comments (7)

[link] Claude 3 Opus can operate as a Turing machine
Gunnar_Zarncke · 2024-04-17T08:41:57.209Z · comments (2)

Doomsday Argument and the False Dilemma of Anthropic Reasoning
Ape in the coat · 2024-07-05T05:38:39.428Z · comments (55)

Drone Wars Endgame
RussellThor · 2024-02-01T02:30:46.161Z · comments (71)

Good job opportunities for helping with the most important century
HoldenKarnofsky · 2024-01-18T17:30:03.332Z · comments (0)

AI Safety Camp final presentations
Linda Linsefors · 2024-03-29T14:27:43.503Z · comments (3)

Introduce a Speed Maximum
jefftk (jkaufman) · 2024-01-11T02:50:04.284Z · comments (28)

[link] Searching for the Root of the Tree of Evil
Ivan Vendrov (ivan-vendrov) · 2024-06-08T17:05:53.950Z · comments (14)

[link] Toki pona FAQ
dkl9 · 2024-03-17T21:44:21.782Z · comments (8)

A Matter of Taste
Zvi · 2024-12-18T17:50:07.201Z · comments (4)

[link] The Way According To Zvi
Sable · 2024-12-07T17:35:48.769Z · comments (0)

Cross-context abduction: LLMs make inferences about procedural training data leveraging declarative facts in earlier training data
Sohaib Imran (sohaib-imran) · 2024-11-16T23:22:21.857Z · comments (11)

Deep Learning is cheap Solomonoff induction?
Lucius Bushnaq (Lblack) · 2024-12-07T11:00:56.455Z · comments (1)

Grammars, subgrammars, and combinatorics of generalization in transformers
Dmitry Vaintrob (dmitry-vaintrob) · 2025-01-02T09:37:23.191Z · comments (0)

Fertility Roundup #4
Zvi · 2024-12-02T14:30:05.968Z · comments (16)

[question] Which Biases are most important to Overcome?
abstractapplic · 2024-12-01T15:40:06.096Z · answers+comments (24)

[link] Is the AI Doomsday Narrative the Product of a Big Tech Conspiracy?
garrison · 2024-12-04T19:20:59.286Z · comments (1)

A sketch of acausal trade in practice
Richard_Ngo (ricraz) · 2024-02-04T00:32:54.622Z · comments (4)

Motivating Alignment of LLM-Powered Agents: Easy for AGI, Hard for ASI?
RogerDearnaley (roger-d-1) · 2024-01-11T12:56:29.672Z · comments (4)

What Helped Me - Kale, Blood, CPAP, X-tiamine, Methylphenidate
Johannes C. Mayer (johannes-c-mayer) · 2024-01-03T13:22:11.700Z · comments (12)

Predictive model agents are sort of corrigible
Raymond D · 2024-01-05T14:05:03.037Z · comments (6)

My Detailed Notes & Commentary from Secular Solstice
Jeffrey Heninger (jeffrey-heninger) · 2024-03-23T18:48:51.894Z · comments (16)

Dangers of Closed-Loop AI
Gordon Seidoh Worley (gworley) · 2024-03-22T23:52:22.010Z · comments (9)

[link] OpenAI appoints Retired U.S. Army General Paul M. Nakasone to Board of Directors
[deleted] · 2024-06-13T21:28:18.110Z · comments (10)

How predictive processing solved my wrist pain
max_shen (makoshen) · 2024-07-04T01:56:20.162Z · comments (8)

Agency in Politics
Martin Sustrik (sustrik) · 2024-07-17T05:30:01.873Z · comments (2)

Humans aren't fleeb.
Charlie Steiner · 2024-01-24T05:31:46.929Z · comments (5)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

tom-davidson on Human takeover might be worse than AI takeover

I agree the easy vs hard worlds influence the chance of AI taking over.

But are you also claiming it influences the badness of takeover conditional on it happening? (That's the subject of my post)

otto-barten on otto.barten's Shortform

Care to elaborate? Are there posts on the topic?

sloonz on In Defense of a Butlerian Jihad

I’m not sure why you think it matters ?

I was mostly speaking about the democratic consensus here, but I’m also pretty sure that it’s also perfectly reasonable opinions, each point taken in isolation.

If you’re going to argue that the preference of doctors is more important that the welfare of patients, I’m genuinely interested by your arguments.

lukas_gloor on When is reward ever the optimization target?

So I think the more rational and cognitively capable a human is, the more likely they'll optimize more strictly and accurately for future reward.

If this is true at all, it's not going to be a very strong effect, meaning you can find very rational and cognitively capable people who do the opposite of this in decision situations that directly pit reward against the things they hold most dearly. (And it may not be true because a lot of personal hedonists tend to "lack sophistication," in the sense that they don't understand that their own feelings of valuing nothing but their own pleasure is not how everyone else who's smart experiences the world. So, there's at least a midwit level of "sophistication" where hedonists seem overrepresented.)

Maybe it's the case that there's a weak correlation that makes the quote above "technically accurate," but that's not enough to speak of reward being the optimization target. For comparison, even if it is the case that more intelligent people prefer classical music over k-pop, that doesn't mean classical music is somehow inherently superior to k-pop, or that classical music is "the music taste target" in any revealing or profound sense. After all, some highly smart people can still be into k-pop without making any mistake.

I've written about this extensively here [EA · GW] and here [EA · GW]. Some relevant exercepts from the first linked post:

One of many takeaways I got from reading Kaj Sotala’s multi-agent models of mind sequence [? · GW] (as well as comments by him [LW(p) · GW(p)]) is that we can model people as pursuers of deep-seated needs. In particular, we have subsystems (or “subagents”) in our minds devoted to various needs-meeting strategies. The subsystems contribute behavioral strategies and responses to help maneuver us toward states where our brain predicts our needs will be satisfied. We can view many of our beliefs, emotional reactions, and even our self-concept/identity as part of this set of strategies. Like life plans, life goals are “merely” components of people’s needs-meeting machinery.^[8]
Still, as far as components of needs-meeting machinery go, life goals are pretty unusual. Having life goals means to care about an objective enough to (do one’s best to) disentangle success on it from the reasons we adopted said objective in the first place. The objective takes on a life of its own, and the two aims (meeting one’s needs vs. progressing toward the objective) come apart. Having a life goal means having a particular kind of mental organization so that “we” – particularly the rational, planning parts of our brain – come to identify with the goal more so than with our human needs.^[9]
To form a life goal, an objective needs to resonate with someone’s self-concept and activate (or get tied to) mental concepts like instrumental rationality and consequentialism. Some life goals may appeal to a person’s systematizing tendencies and intuitions for consistency. Scrupulosity or sacredness intuitions may also play a role, overriding the felt sense that other drives or desires (objectives other than the life goal) are of comparable importance.

[...]
Adopting an optimization mindset toward outcomes inevitably leads to a kind of instrumentalization of everything “near term.” For example, suppose your life goal is about maximizing the number of your happy days. The rational way to go about your life probably implies treating the next decades as “instrumental only.” On a first approximation, the only thing that matters is optimizing the chances of obtaining indefinite life extension (potentially leading to more happy days). Through adopting an outcome-focused optimizing mindset, seemingly self-oriented concerns such as wanting to maximize the number of happiness moments turn into an almost “other-regarding” endeavor. After all, only one’s far-away future selves get to enjoy the benefits – which can feel essentially like living for someone else.^[12]
[12] This points at another line of argument (in addition to the ones I gave in my previous post) to show why hedonist axiology isn’t universally compelling:
To be a good hedonist, someone has to disentangle the part of their brain that cares about short-term pleasure from the part of them that does long-term planning. In doing so, they prove they’re capable of caring about something other than their pleasure. It is now an open question whether they use this disentanglement capability for maximizing pleasure or for something else that motivates them to act on long-term plans.

avturchin on Weird QM Interpretation tries to Solve both Fermi and SA?

I don't see the claim about merging universes in the linked Wei Dai text.

sloonz on In Defense of a Butlerian Jihad

This (a) doesn't have anything in particular to do with Christianity, (b) has been the most widely held view among people in general since forever, and (c) seems obviously correct. If you want to rely on the contrary supposition, I'm afraid you're going to have to argue for it.

Yes, I agree that is the least obviously "wrong" part of the three "copes", and merge with the next remark. It’s very hard to answer that. I’ll start with the simple answer that will convince some, but perhaps not everyone :

I am very low in the "Negative Utilitarianism" scale. I really don’t care much about minimizing suffering in the universe. Still a bit, sure, but not that much. Still, I recognize it is very important to some persons, my current best rules for creating a "Best Model of Human Values" says these persons count, so it’s a pretty good Existence Proof that it’s a Pretty Important Value even if I don’t feel it a lot myself.

So I am going to give you the exact same Existence Proof : I notice that if you give me everything else, Hedonistic Happiness, Justice, Health, etc. and take away Agency (which means having things to do that go beyond "having a hobby"), the value to me of my existence is not 0, but not that far above 0. If I live in such a society and we need to sacrifice some individuals, I will happily step in, "nothing of value was lost" style. If I live in such a society and Omega appears and announce "Sorry, Vacuum Decay Bubble incoming, everyone is going to disappear in exactly 3 minutes", I will sure feel bad for "everyone" who is apparently pretty happy, but I will also think "well, I was already pretty much dead inside anyway".

Please define it in a succinct, relevant, and unambiguous way

I’m afraid you will have to pick only two adjectives, should you want to ask this to someone smarter and more articulated than me but with the same views. Alas, you’re stuck with me, so we’ll have to pick one, so let’s pick "relevant".

It will be also very hand-wavy. Despite all that, it’s still the best I can do. Sorry.

Take the Utility Function of someone. We can decide to split it in roughly three parts :

$D_{i}$ is "direct" utility. I’m hungry, I want ice scream. Would love to go to that concert. Just got an idea, can I build it ?

$P_{i} j$ is "how much i care about j"

$U_{j}$ is the Utility Function of j.

This is a roughly speaking a very rough first approximation of "how to model egoism and altruism". Yes, I’m fully conscious that this is far from capturing most of interpersonal relationships & utility. I still think it’s relevant to point at the Big Picture, namely : if A only cares about B (all other P_Ax are 0) and B only cares about A (same), then if D_A = 0 and D_B = 0 : there is not utility left. Or : a world of Pure Altruists who only cares about others is a worthless world.

Which is not the same as to say that Altruism is Worthless. As long as you have at least some D_something, Altruism can create arbitrarily large values of Utility, as a multiplying force. It’s why it’s such a potent Human Value.

Now let’s go even further in handwaviness : this generalizes : many Values are similarly powerless at creating Utility from Nothing. They just act as Force Multipliers.

I call "Meaningful Values" the one that can create Utility by themselves, without having to rely on others to be present in the first place. Which does not means that the others (let’s call them Amplifying Values) are meaningless, to be clear. They just happen to become meaningless if you have 0 Meaningful Value hanging around.

In short : I’m very afraid that we’re putting a lot of load-bearing "we’ll be fine, there’s still value" on mostly Amplifying Values.

When I said above "I notice that in a world where I don’t have meaningful Agency, I don’t put much value on my own existence", I do not say "I do not like hobbies". I happen to like hobbies, in this world ! I’m also self-reflective enough that what I like in hobbies in the opportunity to grow, and the value of growing resolves in being better at Agency, which is a way better candidate as a "Meaningful Value" and a "Terminal Value". Hence, if you throw me in the UBI Paradise (let’s drop "Christian", it seems to annoy everyone), the value of hobbies go to zero, too, and I become a shell of my current self, despite my current self saying "hobbies are cool".

The democratic consensus also won't allow a Butlerian Jihad, and I don't think you're claiming that it will.

Okay, there’s two things to unpack here.

First, I believe with those answers that I went too far in the Editoralizing vs Being Precise tradeoff with the term "Butlerian Jihad", without even explaining what I mean. I will half-apologize for that, only half because I didn’t intend the "Butlerian Jihad" to actually be the central point ; the central point is about how we’re not ready to tackle the problem of Human Values but that current AI timelines force us to. You can see it’s pretty dumb of me to put a non-central point as the title. I have no defense for that.

Second : By Butlerian Jihad, I do not mean "no AI, ever, forever", I mostly mean "a very long pause, at capabilities levels far bellow AGI. I feel already bad about GPT 5 even if it does no go human-level. I’m not even sure I’m entirely fine with GPT 4"

Contra you and Zvi, I think that if GPT 5 leads to 80% jobs automation, the democratic consensus will be pretty much the Dune version of the Butlerian Jihad. No AI forever and the guillotine for those who try. Which I would agree with you and Zvi and probably everyone else on lesswrong is not a good outcome. I don’t think it’s a very interesting point of discussion either, so let’s drop it ?

I'm actually not sure what you're arguing for or against in this whole section.

I’m essentially arguing against taking Human Values as some Abstract Adjectives like Happiness and Health and Equality and Justice and forgetting about… you know… the humans in the process.

What that has to do with justice destroying the world, I have absolutely no clue

It’s about Abstract Justice destroying humans (values) if you go too far in your Love for Justice and forget that they’re the reason we want Justice in the first place.

Some values have always won, and some values have always lost, and that will not change

Yes, which already rises an important point :

What value do we (who is "we" ?) place on Diversity ? On values which we do not personally have but that seems to have a good place in our "Best Model of Human Values" ? What about values which do not really fit in our "Best Model of Human Values", but turns out that some other humans on the planet happen to put in their model of the "Best Model of Human Values". What if that other human is your sworn enemy ?

It was there that I was trying to point with my "exercise for the reader".

I think you're trying to take the view that any major change in the "human condition", or in what's "human", is equivalent to the destruction of the world, no matter what benefits it may have. This is obviously wrong

Oh, I will not dispute that it is wrong. Better the super-happies that the literal Void. Just not much better.

You seem a bit bitter about my "I won’t expand on that", "too long post", and so on. I’m sorry, but I spent two days on the post, already 2 hours on one reply. I’m not a good or prolific writer. I have to pick what I spend my energy on.

So you're siding with the guy who killed 15 billion non-consenting people because he personally couldn't handle the idea of giving up suffering?

I initially didn’t want to reply to that. I don’t want to fight you. I just want to reply as an illustration of how fast things can go difficult and conflictual. It doesn’t take much :

So you’re siding with the guy who is going to forcibly wirehead all sentient life in the universe, just because he can’t handle that somewhere, someone is using his agency wrong and suffering as a result ?

That being said, what now ? Should we fight each other to death for the control of the AGI, to decide whether the universe will have Agency and Suffering, or no Agency and no Suffering ?

Human Values have been changing, for individuals and in the "average", for as long as there've been humans, including being discarded consciously or unconsciously. Mostly in a pretty aimless, drifting way.

Lot of consciously too, but yes.

neither AI nor anything else will fundamentally change it.

Hard disagree on that (wait, is this the first real disagreement we have ?). We can have the supperhappies if we want to (or for that matter, the baby-eaters). We couldn’t before. The supperhappies do represent a fundamental change.

Before, we still had not much choice over diversity. Many people fought countless wars to reduce diversity in humans values, without much overall success (some, yes, but not much in the grand picture of things). In the AGI age nothing forces the one controlling the AGI to care much for diversity. It will have to be a deliberate choice. And do you notice all the forces and values already arraying against diversity ? It does not bode well for those who value at least some diversity.

I haven't actually heard many people suggesting that.

That’s the "best guess of what we will do with AGI" from those building AGI.

vanessa-kosoy on Are there cognitive realms?

This post states and speculates on an important question: are there different mind types that are in some sense "fully general" (the author calls it "unbounded") but are nevertheless qualitatively different. The author calls these hypothetical mind taxa "cognitive realms".

This is how I think about this question, from within the LTA [LW · GW]:

To operationalize "minds" we should be thinking of learning algorithms. Learning algorithms can be classified according to their "syntax" and "semantics" (my own terminology). Here, semantics refers to questions such as (i) what type of object is the algorithm learning (ii) what is the feedback/data available to the algorithm and (iii) what is the success criterion/parameter of the algorithm. On the other hand, syntax refers to the prior and/or hypothesis class of the algorithm (where the hypothesis class might be parameterized in a particular way, with particular requirements on how the learning rate depends on the parameters).

Among different semantics, we are especially interested in those that are in some sense agentic. Examples include reinforcement learning, infra-Bayesian reinforcement learning [LW · GW], metacognitive agents [LW(p) · GW(p)] and infra-Bayesian physicalist agents [LW · GW].

Do different agentic semantics correspond to different cognitive realms? Maybe, but maybe not: it is plausible that most of them are reflectively unstable. For example Christiano's malign prior [LW(p) · GW(p)] might be a mechanism for how all agents converge to infra-Bayesian physicalism.

Agents with different syntaxes is another candidate for cognitive realms. Here, the question is whether there is an (efficiently learnable) syntax that is in some sense "universal": all other (efficiently learnable) syntaxes can be efficiently translated into it. This is a wide open question. (See also "frugal universal prior [LW · GW]".)

In the context of AI alignment, in order to achieve superintelligence it is arguably sufficient to use a syntax equivalent to whatever is used by human brain algorithms. Moreover, it's plausible that any algorithm we can come up can only have an equivalent or weaker syntax (the process of us discovering the new syntax suggests an embedding of the new syntax into our own). Therefore, even if there are many cognitive realms, then for our purposes we mostly only care about one of them. However, the multiplicity of realms has implications on how simple/natural/canonical should we expect the choice of syntax for our theory of agents to be (the less realms, the more canonical).

quetzal_rainbow on Daniel Tan's Shortform

Chess tree looks like classical example. Each node is a boardstate, edges are allowed moves. Working heuristics in move evaluators can be understood as sort of theorem "if such-n-such algorithm recognizes this state, it's an evidence in favor of white winning 1.5:1". Note that it's possible to build powerful NN-player without explicit search.

edouard-harris on Policymakers don't have access to paywalled articles

Because of another stupid thing, which is that U.S. depts & agencies have strong internal regs against employees soliciting and/or accepting gifts other than in carefully carved out exceptional cases. For more on this, see, e.g., 5 CFR § 2635.204, but this isn't the only such reg. In practice U.S. government employees at all levels are broadly prohibited from accepting any gift with a market value above 20 USD for example. (As you'd expect this leads to a lot of weird outcomes, including occasional hilarious minor diplomatic incidents with inexperienced foreign counterparties who have different gift giving norms.)

lorec on On Dwarkesh Patel’s 4th Podcast With Tyler Cowen

I didn't say he wasn't overrated. I said he was capable of physics.

Did you read the linked post? Bohm, Aharonov, and Bell misunderstood EPR. Bohm's and Aharonov's formulation of the thought experiment is easier to "solve" but does not actually address EPR's concern, which is that mutual non-commutation of x-, y-, and z-spin implies hidden variables must not be superfluous. Again, EPR were fine with mutual non-commutation, and fine with entanglement. What they were pointing out was that the two postulates don't make sense in each other's presence.