LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

[Valence series] 4. Valence & Liking / Admiring
Steven Byrnes (steve2152) · 2024-06-10T14:19:51.194Z · comments (12)

Highlights from Lex Fridman’s interview of Yann LeCun
[deleted] · 2024-03-13T20:58:13.052Z · comments (15)

Sora What
Zvi · 2024-02-22T18:10:05.397Z · comments (3)

How to safely use an optimizer
Simon Fischer (SimonF) · 2024-03-28T16:11:01.277Z · comments (21)

[link] Constructive Cauchy sequences vs. Dedekind cuts
jessicata (jessica.liu.taylor) · 2024-03-14T23:04:07.300Z · comments (23)

[link] Soft Prompts for Evaluation: Measuring Conditional Distance of Capabilities
porby · 2024-02-02T05:49:11.189Z · comments (1)

Value learning in the absence of ground truth
Joel_Saarinen (joel_saarinen) · 2024-02-05T18:56:02.260Z · comments (8)

Critiques of the AI control agenda
Jozdien · 2024-02-14T19:25:04.105Z · comments (14)

[link] "If we go extinct due to misaligned AI, at least nature will continue, right? ... right?"
plex (ete) · 2024-05-18T14:09:53.014Z · comments (23)

[link] on neodymium magnets
bhauth · 2024-01-30T15:58:24.088Z · comments (6)

Some Experiments I'd Like Someone To Try With An Amnestic
johnswentworth · 2024-05-04T22:04:19.692Z · comments (33)

What distinguishes "early", "mid" and "end" games?
Raemon · 2024-06-21T17:41:30.816Z · comments (22)

[link] Analyzing how SAE features evolve across a forward pass
bensenberner · 2024-11-07T22:07:02.827Z · comments (0)

[link] What Ketamine Therapy Is Like
Sable · 2024-11-11T11:09:08.602Z · comments (8)

Caring about excellence
owencb · 2024-07-22T14:24:37.892Z · comments (4)

On Targeted Manipulation and Deception when Optimizing LLMs for User Feedback
Marcus Williams · 2024-11-07T15:39:06.854Z · comments (6)

D&D.Sci Coliseum: Arena of Data Evaluation and Ruleset
aphyer · 2024-10-29T01:21:03.075Z · comments (13)

We Don't Know Our Own Values, but Reward Bridges The Is-Ought Gap
johnswentworth · 2024-09-19T22:22:05.307Z · comments (47)

Extended Interview with Zhukeepa on Religion
Ben Pace (Benito) · 2024-08-18T03:19:05.625Z · comments (59)

I finally got ChatGPT to sound like me
lsusr · 2024-09-17T09:39:59.415Z · comments (18)

[link] Epistemic status: poetry (and other poems)
Richard_Ngo (ricraz) · 2024-11-21T18:13:17.194Z · comments (5)

[link] A dataset of questions on decision-theoretic reasoning in Newcomb-like problems
Caspar Oesterheld (Caspar42) · 2024-12-16T22:42:03.763Z · comments (1)

AI #91: Deep Thinking
Zvi · 2024-11-21T14:30:06.930Z · comments (10)

Book a Time to Chat about Interp Research
Logan Riggs (elriggs) · 2024-12-03T17:27:46.808Z · comments (3)

Considerations on orca intelligence
Towards_Keeperhood (Simon Skade) · 2024-12-29T14:35:16.445Z · comments (5)

Thoughts on "The Offense-Defense Balance Rarely Changes"
Cullen (Cullen_OKeefe) · 2024-02-12T03:26:50.662Z · comments (4)

AI doing philosophy = AI generating hands?
Wei Dai (Wei_Dai) · 2024-01-15T09:04:39.659Z · comments (22)

Some costs of superposition
Linda Linsefors · 2024-03-03T16:08:20.674Z · comments (11)

[link] Metascience of the Vesuvius Challenge
Maxwell Tabarrok (maxwell-tabarrok) · 2024-03-30T12:02:38.978Z · comments (2)

AI Safety 101 : Capabilities - Human Level AI, What? How? and When?
markov (markovial) · 2024-03-07T17:29:53.260Z · comments (8)

Rapid capability gain around supergenius level seems probable even without intelligence needing to improve intelligence
Towards_Keeperhood (Simon Skade) · 2024-05-06T17:09:10.729Z · comments (16)

Big Picture AI Safety: Introduction
EuanMcLean (euanmclean) · 2024-05-23T11:15:44.037Z · comments (7)

On the Proposed California SB 1047
Zvi · 2024-02-12T16:40:04.854Z · comments (18)

I'm open for projects (sort of)
cousin_it · 2024-04-18T18:05:01.395Z · comments (13)

1. The CAST Strategy
Max Harms (max-harms) · 2024-06-07T22:29:13.005Z · comments (19)

[link] The Leeroy Jenkins principle: How faulty AI could guarantee "warning shots"
titotal (lombertini) · 2024-01-14T15:03:21.087Z · comments (6)

[link] Robin Hanson AI X-Risk Debate — Highlights and Analysis
Liron · 2024-07-12T21:31:02.222Z · comments (7)

How to hire somebody better than yourself
lemonhope (lcmgcd) · 2024-08-28T08:12:53.450Z · comments (5)

AI as a powerful meme, via CGP Grey
TheManxLoiner · 2024-10-30T18:31:58.544Z · comments (8)

Anthropic rewrote its RSP
Zach Stein-Perlman · 2024-10-15T14:25:12.518Z · comments (19)

Decision Theory in Space
lsusr · 2024-08-18T07:02:11.847Z · comments (18)

Bounty for Evidence on Some of Palisade Research's Beliefs
benwr · 2024-09-23T20:01:20.917Z · comments (4)

All The Latest Human tFUS Studies
sarahconstantin · 2024-08-09T22:20:04.561Z · comments (2)

On OpenAI’s Model Spec
Zvi · 2024-06-21T13:00:03.014Z · comments (3)

AI #88: Thanks for the Memos
Zvi · 2024-10-31T15:00:07.412Z · comments (5)

Trustworthy and untrustworthy models
Olli Järviniemi (jarviniemi) · 2024-08-19T16:27:11.088Z · comments (3)

Forecasting One-Shot Games
Raemon · 2024-08-31T23:10:05.475Z · comments (0)

AI #75: Math is Easier
Zvi · 2024-08-01T13:40:05.539Z · comments (25)

Enriched tab is now the default LW Frontpage experience for logged-in users
Ruby · 2024-06-21T00:09:30.441Z · comments (27)

The Shallow Bench
Karl Faulks (karl-faulks) · 2024-11-05T05:07:27.357Z · comments (5)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

avturchin on Weird QM Interpretation tries to Solve both Fermi and SA?

I don't see the claim about merging universes in the linked Wei Dai text.

sloonz on In Defense of a Butlerian Jihad

This (a) doesn't have anything in particular to do with Christianity, (b) has been the most widely held view among people in general since forever, and (c) seems obviously correct. If you want to rely on the contrary supposition, I'm afraid you're going to have to argue for it.

Yes, I agree that is the least obviously "wrong" part of the three "copes", and merge with the next remark. It’s very hard to answer that. I’ll start with the simple answer that will convince some, but perhaps not everyone :

I am very low in the "Negative Utilitarianism" scale. I really don’t care much about minimizing suffering in the universe. Still a bit, sure, but not that much. Still, I recognize it is very important to some persons, my current best rules for creating a "Best Model of Human Values" says these persons count, so it’s a pretty good Existence Proof that it’s a Pretty Important Value even if I don’t feel it a lot myself.

So I am going to give you the exact same Existence Proof : I notice that if you give me everything else, Hedonistic Happiness, Justice, Health, etc. and take away Agency (which means having things to do that go beyond "having a hobby"), the value to me of my existence is not 0, but not that far above 0. If I live in such a society and we need to sacrifice some individuals, I will happily step in, "nothing of value was lost" style. If I live in such a society and Omega appears and announce "Sorry, Vacuum Decay Bubble incoming, everyone is going to disappear in exactly 3 minutes", I will sure feel bad for "everyone" who is apparently pretty happy, but I will also think "well, I was already pretty much dead inside anyway".

Please define it in a succinct, relevant, and unambiguous way

I’m afraid you will have to pick only two adjectives, should you want to ask this to someone smarter and more articulated than me but with the same views. Alas, you’re stuck with me, so we’ll have to pick one, so let’s pick "relevant".

It will be also very hand-wavy. Despite all that, it’s still the best I can do. Sorry.

Take the Utility Function of someone. We can decide to split it in roughly three parts :

$D_{i}$ is "direct" utility. I’m hungry, I want ice scream. Would love to go to that concert. Just got an idea, can I build it ?

$P_{i} j$ is "how much i care about j"

$U_{j}$ is the Utility Function of j.

This is a roughly speaking a very rough first approximation of "how to model egoism and altruism". Yes, I’m fully conscious that this is far from capturing most of interpersonal relationships & utility. I still think it’s relevant to point at the Big Picture, namely : if A only cares about B (all other P_Ax are 0) and B only cares about A (same), then if D_A = 0 and D_B = 0 : there is not utility left. Or : a world of Pure Altruists who only cares about others is a worthless world.

Which is not the same as to say that Altruism is Worthless. As long as you have at least some D_something, Altruism can create arbitrarily large values of Utility, as a multiplying force. It’s why it’s such a potent Human Value.

Now let’s go even further in handwaviness : this generalizes : many Values are similarly powerless at creating Utility from Nothing. They just act as Force Multipliers.

I call "Meaningful Values" the one that can create Utility by themselves, without having to rely on others to be present in the first place. Which does not means that the others (let’s call them Amplifying Values) are meaningless, to be clear. They just happen to become meaningless if you have 0 Meaningful Value hanging around.

In short : I’m very afraid that we’re putting a lot of load-bearing "we’ll be fine, there’s still value" on mostly Amplifying Values.

When I said above "I notice that in a world where I don’t have meaningful Agency, I don’t put much value on my own existence", I do not say "I do not like hobbies". I happen to like hobbies, in this world ! I’m also self-reflective enough that what I like in hobbies in the opportunity to grow, and the value of growing resolves in being better at Agency, which is a way better candidate as a "Meaningful Value" and a "Terminal Value". Hence, if you throw me in the UBI Paradise (let’s drop "Christian", it seems to annoy everyone), the value of hobbies go to zero, too, and I become a shell of my current self, despite my current self saying "hobbies are cool".

The democratic consensus also won't allow a Butlerian Jihad, and I don't think you're claiming that it will.

Okay, there’s two things to unpack here.

First, I believe with those answers that I went too far in the Editoralizing vs Being Precise tradeoff with the term "Butlerian Jihad", without even explaining what I mean. I will half-apologize for that, only half because I didn’t intend the "Butlerian Jihad" to actually be the central point ; the central point is about how we’re not ready to tackle the problem of Human Values but that current AI timelines force us to. You can see it’s pretty dumb of me to put a non-central point as the title. I have no defense for that.

Second : By Butlerian Jihad, I do not mean "no AI, ever, forever", I mostly mean "a very long pause, at capabilities levels far bellow AGI. I feel already bad about GPT 5 even if it does no go human-level. I’m not even sure I’m entirely fine with GPT 4"

Contra you and Zvi, I think that if GPT 5 leads to 80% jobs automation, the democratic consensus will be pretty much the Dune version of the Butlerian Jihad. No AI forever and the guillotine for those who try. Which I would agree with you and Zvi and probably everyone else on lesswrong is not a good outcome. I don’t think it’s a very interesting point of discussion either, so let’s drop it ?

I'm actually not sure what you're arguing for or against in this whole section.

I’m essentially arguing against taking Human Values as some Abstract Adjectives like Happiness and Health and Equality and Justice and forgetting about… you know… the humans in the process.

What that has to do with justice destroying the world, I have absolutely no clue

It’s about Abstract Justice destroying humans (values) if you go too far in your Love for Justice and forget that they’re the reason we want Justice in the first place.

Some values have always won, and some values have always lost, and that will not change

Yes, which already rises an important point :

What value do we (who is "we" ?) place on Diversity ? On values which we do not personally have but that seems to have a good place in our "Best Model of Human Values" ? What about values which do not really fit in our "Best Model of Human Values", but turns out that some other humans on the planet happen to put in their model of the "Best Model of Human Values". What if that other human is your sworn enemy ?

It was there that I was trying to point with my "exercise for the reader".

I think you're trying to take the view that any major change in the "human condition", or in what's "human", is equivalent to the destruction of the world, no matter what benefits it may have. This is obviously wrong

Oh, I will not dispute that it is wrong. Better the super-happies that the literal Void. Just not much better.

You seem a bit bitter about my "I won’t expand on that", "too long post", and so on. I’m sorry, but I spent two days on the post, already 2 hours on one reply. I’m not a good or prolific writer. I have to pick what I spend my energy on.

So you're siding with the guy who killed 15 billion non-consenting people because he personally couldn't handle the idea of giving up suffering?

I initially didn’t want to reply to that. I don’t want to fight you. I just want to reply as an illustration of how fast things can go difficult and conflictual. It doesn’t take much :

So you’re siding with the guy who is going to forcibly wirehead all sentient life in the universe, just because he can’t handle that somewhere, someone is using his agency wrong and suffering as a result ?

That being said, what now ? Should we fight each other to death for the control of the AGI, to decide whether the universe will have Agency and Suffering, or no Agency and no Suffering ?

Human Values have been changing, for individuals and in the "average", for as long as there've been humans, including being discarded consciously or unconsciously. Mostly in a pretty aimless, drifting way.

Lot of consciously too, but yes.

neither AI nor anything else will fundamentally change it.

Hard disagree on that (wait, is this the first real disagreement we have ?). We can have the supperhappies if we want to (or for that matter, the baby-eaters). We couldn’t before. The supperhappies do represent a fundamental change.

Before, we still had not much choice over diversity. Many people fought countless wars to reduce diversity in humans values, without much overall success (some, yes, but not much in the grand picture of things). In the AGI age nothing forces the one controlling the AGI to care much for diversity. It will have to be a deliberate choice. And do you notice all the forces and values already arraying against diversity ? It does not bode well for those who value at least some diversity.

I haven't actually heard many people suggesting that.

That’s the "best guess of what we will do with AGI" from those building AGI.

vanessa-kosoy on Are there cognitive realms?

This post states and speculates on an important question: are there different mind types that are in some sense "fully general" (the author calls it "unbounded") but are nevertheless qualitatively different. The author calls these hypothetical mind taxa "cognitive realms".

This is how I think about this question, from within the LTA [LW · GW]:

To operationalize "minds" we should be thinking of learning algorithms. Learning algorithms can be classified according to their "syntax" and "semantics" (my own terminology). Here, semantics refers to questions such as (i) what type of object is the algorithm learning (ii) what is the feedback/data available to the algorithm and (iii) what is the success criterion/parameter of the algorithm. On the other hand, syntax refers to the prior and/or hypothesis class of the algorithm (where the hypothesis class might be parameterized in a particular way, with particular requirements on how the learning rate depends on the parameters).

Among different semantics, we are especially interested in those that are in some sense agentic. Examples include reinforcement learning, infra-Bayesian reinforcement learning [LW · GW], metacognitive agents [LW(p) · GW(p)] and infra-Bayesian physicalist agents [LW · GW].

Do different agentic semantics correspond to different cognitive realms? Maybe, but maybe not: it is plausible that most of them are reflectively unstable. For example Christiano's malign prior [LW(p) · GW(p)] might be a mechanism for how all agents converge to infra-Bayesian physicalism.

Agents with different syntaxes is another candidate for cognitive realms. Here, the question is whether there is an (efficiently learnable) syntax that is in some sense "universal": all other (efficiently learnable) syntaxes can be efficiently translated into it. This is a wide open question. (See also "frugal universal prior [LW · GW]".)

In the context of AI alignment, in order to achieve superintelligence it is arguably sufficient to use a syntax equivalent to whatever is used by human brain algorithms. Moreover, it's plausible that any algorithm we can come up can only have an equivalent or weaker syntax (the process of us discovering the new syntax suggests an embedding of the new syntax into our own). Therefore, even if there are many cognitive realms, then for our purposes we mostly only care about one of them. However, the multiplicity of realms has implications on how simple/natural/canonical should we expect the choice of syntax for our theory of agents to be (the less realms, the more canonical).

quetzal_rainbow on Daniel Tan's Shortform

Chess tree looks like classical example. Each node is a boardstate, edges are allowed moves. Working heuristics in move evaluators can be understood as sort of theorem "if such-n-such algorithm recognizes this state, it's an evidence in favor of white winning 1.5:1". Note that it's possible to build powerful NN-player without explicit search.

edouard-harris on Policymakers don't have access to paywalled articles

Because of another stupid thing, which is that U.S. depts & agencies have strong internal regs against employees soliciting and/or accepting gifts other than in carefully carved out exceptional cases. For more on this, see, e.g., 5 CFR § 2635.204, but this isn't the only such reg. In practice U.S. government employees at all levels are broadly prohibited from accepting any gift with a market value above 20 USD for example. (As you'd expect this leads to a lot of weird outcomes, including occasional hilarious minor diplomatic incidents with inexperienced foreign counterparties who have different gift giving norms.)

lorec on On Dwarkesh Patel’s 4th Podcast With Tyler Cowen

I didn't say he wasn't overrated. I said he was capable of physics.

Did you read the linked post? Bohm, Aharonov, and Bell misunderstood EPR. Bohm's and Aharonov's formulation of the thought experiment is easier to "solve" but does not actually address EPR's concern, which is that mutual non-commutation of x-, y-, and z-spin implies hidden variables must not be superfluous. Again, EPR were fine with mutual non-commutation, and fine with entanglement. What they were pointing out was that the two postulates don't make sense in each other's presence.

daniel-tan on Daniel Tan's Shortform

That’s interesting! What would be some examples of axioms and theorems that describe a directed tree?

nathan-helm-burger on No one has the ball on 1500 Russian olympiad winners who've received HPMOR

Ok, probably a silly idea but... Maybe have some kind of competition for young people involving something like math / computer science / essay writing / puzzle solving / jailbreaking LLMs... Give a cash prize for the top three, and send books to a bunch of the runners up.

Sounds like something that would take a lot of organization effort, so somebody would need to be excited enough about this idea to want to spearhead it.

morpheus on The purposeful drunkard

Yes!

dmitry-vaintrob on The purposeful drunkard

Do the images load now?