LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Thousands of malicious actors on the future of AI misuse
Zershaaneh Qureshi (zershaaneh-qureshi) · 2024-04-01T10:08:42.357Z · comments (0)

Deconfusing In-Context Learning
Arjun Panickssery (arjun-panickssery) · 2024-02-25T09:48:17.690Z · comments (1)

UDT1.01: The Story So Far (1/10)
Diffractor · 2024-03-27T23:22:35.170Z · comments (6)

Childhood and Education Roundup #5
Zvi · 2024-04-17T13:00:03.015Z · comments (3)

Distinguish worst-case analysis from instrumental training-gaming
Olli Järviniemi (jarviniemi) · 2024-09-05T19:13:34.443Z · comments (0)

[link] Increasing IQ is trivial
George3d6 · 2024-03-01T22:43:32.037Z · comments (61)

Turning Your Back On Traffic
jefftk (jkaufman) · 2024-07-17T01:00:08.627Z · comments (7)

[link] My Model of Epistemology
adamShimi · 2024-08-31T17:01:45.472Z · comments (1)

[link] Locally optimal psychology
Chipmonk · 2024-11-25T18:35:11.985Z · comments (7)

The murderous shortcut: a toy model of instrumental convergence
Thomas Kwa (thomas-kwa) · 2024-10-02T06:48:06.787Z · comments (0)

Orca communication project - seeking feedback (and collaborators)
Towards_Keeperhood (Simon Skade) · 2024-12-03T17:29:40.802Z · comments (16)

[link] The Way According To Zvi
Sable · 2024-12-07T17:35:48.769Z · comments (0)

Building Big Science from the Bottom-Up: A Fractal Approach to AI Safety
Lauren Greenspan (LaurenGreenspan) · 2025-01-07T03:08:51.447Z · comments (2)

The quantum red pill or: They lied to you, we live in the (density) matrix
Dmitry Vaintrob (dmitry-vaintrob) · 2025-01-17T13:58:16.186Z · comments (34)

[link] A Percentage Model of a Person
Sable · 2024-10-12T17:55:07.560Z · comments (3)

AI #66: Oh to Be Less Online
Zvi · 2024-05-30T14:20:03.334Z · comments (6)

Exploring SAE features in LLMs with definition trees and token lists
mwatkins · 2024-10-04T22:15:28.108Z · comments (5)

[question] If I have some money, whom should I donate it to in order to reduce expected P(doom) the most?
KvmanThinking (avery-liu) · 2024-10-03T11:31:19.974Z · answers+comments (37)

An Introduction to Representation Engineering - an activation-based paradigm for controlling LLMs
Jan Wehner · 2024-07-14T10:37:21.544Z · comments (6)

Eye contact is effortless when you’re no longer emotionally blocked on it
Chipmonk · 2024-09-27T21:47:01.970Z · comments (24)

Intent alignment as a stepping-stone to value alignment
Seth Herd · 2024-11-05T20:43:24.950Z · comments (6)

I'm creating a deep dive podcast episode about the original Leverage Research - would you like to take part?
spencerg · 2024-09-22T14:03:22.164Z · comments (2)

A New Class of Glitch Tokens - BPE Subtoken Artifacts (BSA)
Lao Mein (derpherpize) · 2024-09-20T13:13:26.181Z · comments (7)

Mental Masturbation and the Intellectual Comfort Zone
Declan Molony (declan-molony) · 2024-05-07T05:47:05.257Z · comments (2)

[question] Is there software to practice reading expressions?
lsusr · 2024-04-23T21:53:00.679Z · answers+comments (11)

[link] WSJ: Inside Amazon’s Secret Operation to Gather Intel on Rivals
trevor (TrevorWiesinger) · 2024-04-23T21:33:08.049Z · comments (5)

Doing Research Part-Time is Great
casualphysicsenjoyer (hatta_afiq) · 2024-11-22T19:01:15.542Z · comments (7)

We’re not as 3-Dimensional as We Think
silentbob · 2024-08-04T14:39:16.799Z · comments (16)

Effectively Handling Disagreements - Introducing a New Workshop
Camille Berger (Camille Berger) · 2024-04-15T16:33:50.339Z · comments (2)

[link] Turning 22 in the Pre-Apocalypse
testingthewaters · 2024-08-22T20:28:25.794Z · comments (14)

Gated Attention Blocks: Preliminary Progress toward Removing Attention Head Superposition
cmathw · 2024-04-08T11:14:43.268Z · comments (4)

[question] When is reward ever the optimization target?
Noosphere89 (sharmake-farah) · 2024-10-15T15:09:20.912Z · answers+comments (17)

On DeepMind’s Frontier Safety Framework
Zvi · 2024-06-18T13:30:21.154Z · comments (4)

Childhood and Education #8: Dealing with the Internet
Zvi · 2025-01-06T14:00:09.604Z · comments (7)

An anti-inductive sequence
Viliam · 2024-08-14T12:28:54.226Z · comments (10)

A Sober Look at Steering Vectors for LLMs
Joschka Braun (joschka-braun) · 2024-11-23T17:30:00.745Z · comments (0)

A Matter of Taste
Zvi · 2024-12-18T17:50:07.201Z · comments (4)

My disagreements with "AGI ruin: A List of Lethalities"
Noosphere89 (sharmake-farah) · 2024-09-15T17:22:18.367Z · comments (46)

[link] Big tech transitions are slow (with implications for AI)
jasoncrawford · 2024-10-24T14:25:06.873Z · comments (16)

Searching for phenomenal consciousness in LLMs: Perceptual reality monitoring and introspective confidence
EuanMcLean (euanmclean) · 2024-10-29T12:16:18.448Z · comments (8)

Grammars, subgrammars, and combinatorics of generalization in transformers
Dmitry Vaintrob (dmitry-vaintrob) · 2025-01-02T09:37:23.191Z · comments (0)

Fireplace and Candle Smoke
jefftk (jkaufman) · 2025-01-01T01:50:01.408Z · comments (4)

A Straightforward Explanation of the Good Regulator Theorem
Alfred Harwood · 2024-11-18T12:45:48.568Z · comments (3)

Is the Power Grid Sustainable?
jefftk (jkaufman) · 2024-10-26T02:30:06.612Z · comments (38)

Cross-context abduction: LLMs make inferences about procedural training data leveraging declarative facts in earlier training data
Sohaib Imran (sohaib-imran) · 2024-11-16T23:22:21.857Z · comments (11)

Debate: Is it ethical to work at AI capabilities companies?
Ben Pace (Benito) · 2024-08-14T00:18:38.846Z · comments (21)

Don’t Legalize Drugs
Declan Molony (declan-molony) · 2025-01-14T06:51:14.005Z · comments (9)

2024 was the year of the big battery, and what that means for solar power
transhumanist_atom_understander · 2025-02-01T06:27:39.082Z · comments (1)

AI #98: World Ends With Six Word Story
Zvi · 2025-01-09T16:30:07.341Z · comments (2)

Why We Need More Shovel-Ready AI Notkilleveryoneism Megaproject Proposals
Peter Berggren (peter-berggren) · 2025-01-20T22:38:26.593Z · comments (1)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

tailcalled on evhub's Shortform

Is there really some particular human whose volition you'd like to coherently extrapolate over eternity but where you refrain because you're worried it will generate infighting? Or is it more like, you can't think of anybody you'd pick, so you want a decision procedure to pick for you?

If there is some particular human, who is it?

daniel-herrmann on Subjective Naturalism in Decision Theory: Savage vs. Jeffrey–Bolker

Thanks for raising this important point. When modeling these situations carefully, we need to give terms like "today" a precise semantics that's well-defined for the agent. With proper semantics established, we can examine what credences make sense under different ways of handling indexicals. Matthias Hild's paper "Auto-epistemology and updating" demonstrates how to carefully construct time-indexed probability updates. We could then add centered worlds or other approaches for self-locating probabilities.

Some cases might lead to puzzles, particularly where epistemic fixed points don't exist. This might push us toward modeling credences differently or finding other solutions. But once we properly formalize "today" as an event, we can work on satisfying richness conditions. Whether this leads to inconsistent attitudes depends on what constraints we place on those attitudes - something that reasonable people might disagree about, as debates over sleeping beauty suggest.

christiankl on Thread for Sense-Making on Recent Murders and How to Sanely Respond

Pasek did couchsurf at my place in the days after a LessWrong Community Weekend in Berlin. That was before he went to the Bay Area, so probably 8 or 9 years ago and before he seemed to make contact with Ziz which was after Pazek left the Bay Area and moved to live with other rationalists in a group house in Gran Canaria. Pazek's contact with Ziz seemed to be mostly online while living in Gran Canaria.

If you read Pasek's post where he thinks about committing suicide, there's plenty of TDT-thinking in it. I matched my idea of how Pasek thinks even before engaging with Ziz.

Pasek was TDT-ish vegan.

Pasek had some QS tracking for how he spent every waking hour of the day that he did on paper and seemed to not suffer from akrasia while guiding his actions.

If I remember right, that he said that stealing is okay in cases where the TDT calculation would be in favor of stealing where traditional morality would say stealing is bad. I don't think that resulted in Pasek actually stealing things but I think we talked about some case where he thought it was justified to steal which surprised me at the time. My memory is here very fussy.

nancylebovitz on A sense of logic

I dislike raw oysters quite a bit, but they're okay cooked.

Speaking of logical fallacies, the fact that one person loves a thing means that other people will even tolerate it is not strongly likely. I don't know that people even have an obligation to try things other people love.

And yet, the temptation to think that other people do or should love what one loves it very strong. "I think this is great!" just doesn't feel as true as "This is great!".

ape-in-the-coat on Subjective Naturalism in Decision Theory: Savage vs. Jeffrey–Bolker

I'm not sure if I fully understand why this is supposed to pose a problem, but maybe it helps to say that by "meaningfully consider" we mean something like, is actually part of the agent's theory of the world. In your situation, since the agent is considering which envelope to take, I would guess that to satisfy richness she should have a credence in the proposition.

Okay, then I believe you definetely have a problem with this example and would be glad to show you where exactly.

I think (maybe?) what makes this case tricky or counterintuitive is that the agent seems to lack any basis for forming beliefs about which envelope contains the money - their memory is erased each time and the location depends on their previous (now forgotten) choice.
However, this doesn't mean they can't or don't have credences about the envelope contents. From the agent's subjective perspective upon waking, they might assign 0.5 credence to each envelope containing the money, reasoning that they have no information to favor either envelope.

Let's suppose that the agent does exactly that. Suppose they believe that on every awakening there is 50% chance that money is in envelope 1. Then picking envelope 1 every time will in expectation lead to winning 350$ per experiment.

But this is clearly false. The experiment is specifically designed in such a manner that the agent can win money only on the first awakening. On every other day (6 times out of 7) the money would be in the envelope 2.

So should the agent believe that there is only 1/7 chance that money are in envelope 1 then? Also no. I suppose you can see why. As soon as he tries to act on such belief it will turn out that 6 times out of 7 the money are in envelope 1.

In fact, we can notice, that there is no coherent value of credence for statement "Today the money are in envelope 1" that would not lead the agent to irrational behavior. This is because the term "Today" is not well-defined in the setting of such experiment.

By which I mean that in the same iteration of the experiment propositions including "Today" may not have a unique value. On the first day of the experiment statement "Today money are in envelope 1" may be true, while on the second day it may be false, so in the single iteration of the experiment that lasts 7 days the statement is simultaneously true and false!

Which means that "Today money are in envelope 1" isn't actually an event from the event space of the experiment and therefore doesn't have a probability value, as probability function's domain is event space.

But this is a nuance of formal probability theory that most people do not notice, or even try to ignore outright. Our intuitions are accustoimed to situations where statements about "Today" can be represented as well-defined events from the event space and therefore we assume that they can always be "meaningfully considered".

And so if you try to base you decision theory framework on what feels as meaningfull to an agent instead of what is formalizable mathematically, you will end up with a bunch of paradoxical situations, like the one I've just described.

daniel-herrmann on Subjective Naturalism in Decision Theory: Savage vs. Jeffrey–Bolker

Ah, so not like, A is strongly preferred to B and B is strongly preferred to A, but more of a violation of transitivity. Then I still think that the Broome paper is a place I'd look at, since you get that exact kind of structure in preference aggregation.

The Bradley paper assumes everything is transitive throughout, so I don't think you get the kind of structure you want there. I'm not immediately aware of any work of that kind of inconsistency in JB that isn't in the social choice context, but there might be some. I'll take a look.

There are ways to think about degrees and measures of incoherence, and how that connects up to decision making. I'm thinking mainly of this paper by Schervish, Seidenfeld, and Kadane, Measures of Incoherence: How Not to Gamble if You Must. There might a JB-style version of that kind of work, and if there isn't, I think it would be good to have one.

But to your core goal or weakening the preference axioms to more realistic standards, you can definitely do that in JB by weakening the preference axioms, but still keeping the background objects of preference be propositions in a single algebra. I think this would still preserve many of what I consider the naturalistic advantages of the JB system. For modifying the preference axioms, I would guess descriptively you might want something like prospect theory, or something else along those broad lines. Also depends on what kinds of agents we want to describe.

alexander-gietelink-oldenziel on Subjective Naturalism in Decision Theory: Savage vs. Jeffrey–Bolker

Mmmmm

Inconsistent and incomplete preferences are necessary for descriptive agent foundations.

In vNM preference theory an inconsistent preference can be described as cyclic preferences that can be moneypumped.

How to see this in JB ?

martinsq on evhub's Shortform

Imo rationalists tend to underestimate the arbitrariness involved in choosing a CEV procedure (= moral deliberation in full generality).

Like you, I endorse the step of "scoping the reference class" (along with a thousand other preliminary steps). Preemptively fixing it in place helps you to the extent that the humans wouldn't have done it by default. But if the CEV procedure is governed by a group of humans so selfish/unthoughtful as to not even converge on that by themselves, then I'm sure that there'll be at lesat a few hundred other aspects (both more and less subtle than this one) that you and me obviously endorse, but they will not implement, and will drastically affect the outcome of the whole procedure.
In fact, it seems strikingly plausible that even among EAs, the outcome could depend drastically on seemingly-arbitrary starting conditions (like "whether we use deliberation-and-distillation procedure #194 or #635, which differ in some details"). And "drastically" means that, even though both outcomes still look somewhat kindness-shaped and friendly-shaped, one's optimum is worth <10% to the other's utility (or maybe, this holds for the scope-sensitive parts of their morals, since the scope-insensitive ones are trivial to satisfy).

To pump related intuitions about how difficult and arbitrary moral deliberation can get, I like Demski here [? · GW].

daniel-herrmann on Subjective Naturalism in Decision Theory: Savage vs. Jeffrey–Bolker

The JB framework as standardly formulated assumes complete and consistent preferences. Of course, you can keep the same JB-style objects of preference (the propositions) and change modify the preference axioms. For incomplete preferences, there's a nice paper by Richard Bradley, Revising Incomplete Attitudes, that looks at incomplete attitudes in a very Jeffrey-Bolker style framework (all prospects are propositions). It has a nice discussion of different things that might lead to incompleteness (one of which is "Ignorance", related to the kind of Knightian uncertainty you asked about), and also some results and perspectives on attitude changes for imprecise Bayesian agents.

I'm less sure about inconsistent preferences - it depends what exactly you mean by that. Something related might be work on aggregating preferences, which can involve aggregating preferences that disagree and so look inconsistent. John Broome's paper Bolker-Jeffrey Expected Utility Theory and Axiomatic Utilitarianism is excellent on this - it examines both the technical foundations of JB and its connections to social choice and utilitarianism, proving a version of the Harsanyi Utilitarian Theorem in JB.

On imprecise probabilities: the JB framework actually has a built-in form of imprecision. Without additional constraints, the representation theorem gives non-unique probabilities (this is part of Bolker's uniqueness theorem). You can get uniqueness by adding extra conditions, like unbounded utility or primitive comparative probability judgments, but the basic framework allows for some probability imprecision. I'm not sure about deeper connections to infraprobability/Bayesianism, but given that these approaches often involve sets of probabilities, there may be interesting connections to explore.

alfred-harwood on Ruling Out Lookup Tables

hmm, we seem to be talking past each other a bit. I think my main point in response is something like this:

In non-trivial settings, (some but not all) structural differences between programs lead to differences in input/output behaviour, even if there is a large domain for which they are behaviourally equivalent.

But that sentence lacks a lot of nuance! I'll try to break it down a bit more to find if/where we disagree (so apologies if a lot of this is re-hashing).

I agree that if two programs produce the same input output behaviour for literally every conceivable input then there is not much of an interesting difference between them and you can just view one as a compression of the other.
As I said in the post, I consider a program to be 'actually calculating the function', if it is a finite program which will return for every possible input $x$ .
If we have a finite length lookup table, it can only output f(x) for a finite number of inputs.
If that finite number of inputs is less than the total number of possible inputs, this means that there is at least one input (call it x_0) for which a lookup table will not output f(x_0).
I've left unspecified what it will do if you query the lookup table with input x_0. Maybe it doesn't return anything, maybe it outputs an error message, maybe it blows up. The point is that whatever it does, by definition it doesn't return f(x_0).
Maybe the number of possible inputs to f is finite and the lookup table is large enough to accommodate them all. In this case, the lookup table would be a 'finite program which will return $f (x)$ for every possible input $x$ ' so I would be happy to say that there's not much of a distinction between the lookup table and a different method of computing the function. (Of course, there is a trivial difference in the sense that they are different algorithms, but its not the one we are concerned with here).
However, this requires that the size of the program is on the order of (or larger than) the number of possible inputs it might face. I think that in most interesting cases this is not true.
By 'interesting cases', I mean things like humans who do not contain an internal representation of every possible sensory input, or LLMs where the number of possible sentences you could input is larger than the model itself.
This means that in 'interesting cases', a lookup table will, at some point, give a different output to a program which is directly calculating a function.
So structural difference (of the 'lookup table or not' kind) between two finite programs will imply behavioural difference in some domain, even if there is a large domain for which the two programs are behaviourally equivalent (for an environment where the number of possible inputs is larger than the size of the programs).
As I see it, this is the motivation behind the Agent-like structure problem. If you know that a program has agent-like structure, this can help you predict its behaviour in domains where you haven't seen it perform.
Or, conversely, if you know that selecting for certain behaviours is likely to lead to agent-like structure you can avoid selecting for those behaviours (even if, within a certain domain, those behaviours are benign) because in other domains agent-like behaviour is dangerous.
Of course, there are far more types of programs than just 'lookup table' or 'not lookup table'. Lookup tables are just one obvious way for which a program might exhibit certain behaviour in a finite domain which doesn't extend to larger domains.