Steering GPT-2-XL by adding an activation vector 2023-05-13T18:42:41.321Z
Understanding and controlling a maze-solving policy network 2023-03-11T18:59:56.223Z
Beneath My Epistemic Dignity 2023-02-28T04:02:44.696Z
Probability Theory: The Logic of Science, Jaynes 2023-02-16T21:57:47.737Z
Rounding Someone Off 2023-01-24T00:03:48.682Z
Consequentialists: One-Way Pattern Traps 2023-01-16T20:48:56.967Z
Linear Algebra Done Right, Axler 2023-01-02T22:54:58.724Z
Naive Set Theory, Halmos 2022-12-22T02:34:38.509Z
Moorean Statements 2022-10-22T00:50:52.138Z
Dath Ilan's Views on Stopgap Corrigibility 2022-09-22T16:16:07.467Z
Guidelines for Mad Entrepreneurs 2022-09-16T06:33:52.450Z
Framing AI Childhoods 2022-09-06T23:40:40.138Z
The Shard Theory Alignment Scheme 2022-08-25T04:52:50.206Z
"What Mistakes Are You Making Right Now?" 2022-08-15T21:19:59.401Z
Shard Theory: An Overview 2022-08-11T05:44:52.852Z
Team Shard Status Report 2022-08-09T05:33:48.658Z
How Deadly Will Roughly-Human-Level AGI Be? 2022-08-08T01:59:55.690Z
Finding Skeletons on Rashomon Ridge 2022-07-24T22:31:59.885Z
Acceptability Verification: A Research Agenda 2022-07-12T20:11:34.986Z
Abadarian Trades 2022-06-30T16:41:12.232Z
Intelligence in Commitment Races 2022-06-24T14:30:21.525Z
How to Visualize Bayesianism 2022-06-22T13:57:09.721Z
Longtermist Consequences of a New Dark Age? 2022-06-03T23:00:07.051Z
The STEM Attractor 2022-06-03T22:21:39.986Z
Rationalism in an Age of Egregores 2022-06-01T07:29:06.297Z
Infernal Corrigibility, Fiendishly Difficult 2022-05-27T20:32:50.773Z
The "Adults in the Room" 2022-05-17T04:03:20.740Z
Gato as the Dawn of Early AGI 2022-05-15T06:52:02.264Z
Dath Ilani Rule of Law 2022-05-10T06:17:30.951Z
But What's Your *New Alignment Insight,* out of a Future-Textbook Paragraph? 2022-05-07T03:10:12.648Z
Your Utility Function is Your Utility Function 2022-05-06T07:15:24.439Z
Negotiating Up and Down the Simulation Hierarchy: Why We Might Survive the Unaligned Singularity 2022-05-04T04:21:39.742Z
Dath Ilan vs. Sid Meier's Alpha Centauri: Pareto Improvements 2022-04-28T19:26:26.664Z
EU Maximizing in a Gloomy World 2022-04-27T00:28:58.494Z
Why No *Interesting* Unaligned Singularity? 2022-04-20T00:34:55.582Z
Deceptive Agents are a Good Way to Do Things 2022-04-19T18:04:27.861Z
The Irresistible Attraction of Designing Your Own Utopia 2022-04-01T01:34:56.208Z
Agency and Coherence 2022-03-26T19:25:21.649Z
David Udell's Shortform 2022-03-18T04:41:18.780Z
On Defecting On Yourself 2022-03-18T02:21:01.572Z
Your Future Self's Credences Should Be Unpredictable to You 2022-03-11T23:33:38.448Z
HCH and Adversarial Questions 2022-02-19T00:52:29.949Z


Comment by David Udell on Work dumber not smarter · 2023-06-01T22:06:22.213Z · LW · GW

I think that many (not all) of your above examples boil down to optimizing for legibility rather than optimizing for goodness. People who hobnob instead of working quietly will get along with their bosses better than their quieter counterparts, yes. But a company of brown nosers will be less productive than a competitor company of quiet hardworking employees! So there's a cooperate/defect-dilemma here.

What that suggests, I think, is that you generally shouldn't immediately defect as hard as possible, with regard to optimizing for appearances. Play the prevailing local balance between optimizing-for-appearances and optimizing-for-outcomes that everyone around does, and try to not incrementally lower the level of org-wide cooperation. Try to eke that level of cooperation up, and set up incentives accordingly.

Comment by David Udell on David Udell's Shortform · 2023-05-26T03:43:14.562Z · LW · GW

The ML models that now speak English, and are rapidly growing in world-transformative capability, happen to be called transformers.

This is not a coincidence because nothing is a coincidence.

Comment by David Udell on David Udell's Shortform · 2023-05-02T02:02:48.108Z · LW · GW

Two moments of growing in mathematical maturity I remember vividly:

  1. Realizing that equations are claims that are therefore either true or false. Everything asserted with symbols... could just as well be asserted in English. I could start chunking up arbitrarily long and complicated equations between the equals signs, because those equals signs were just the English word "is"!
  2. Learning about the objects that mathematical claims are about. Going from having to look up "Wait, what's a real number again?" to knowing how , and  interrelate told me what we're making claims about. Of course, there are plenty of other mathematical objects -- but getting to know these objects taught me the general pattern.
Comment by David Udell on Exposure to Lizardman is Lethal · 2023-04-01T01:54:23.924Z · LW · GW

I found it distracting that all your examples were topical, anti-red-tribe coded events. That reminded me of

In Artificial Intelligence, and particularly in the domain of nonmonotonic reasoning, there’s a standard problem: “All Quakers are pacifists. All Republicans are not pacifists. Nixon is a Quaker and a Republican. Is Nixon a pacifist?”

What on Earth was the point of choosing this as an example? To rouse the political emotions of the readers and distract them from the main question? To make Republicans feel unwelcome in courses on Artificial Intelligence and discourage them from entering the field? (And no, I am not a Republican. Or a Democrat.)

Why would anyone pick such a distracting example to illustrate nonmonotonic reasoning? Probably because the author just couldn’t resist getting in a good, solid dig at those hated Greens. It feels so good to get in a hearty punch, y’know, it’s like trying to resist a chocolate cookie.

As with chocolate cookies, not everything that feels pleasurable is good for you.

That is, I felt reading this like there were tribal-status markers mixed in with your claims that didn't have to be there, and that struck me as defecting on a stay-non-politicized discourse norm.

Comment by David Udell on David Udell's Shortform · 2023-03-30T21:12:54.567Z · LW · GW

2. The anchor of a major news network donates lots of money to organizations fighting against gay marriage, and in his spare time he writes editorials arguing that homosexuals are weakening the moral fabric of the country. The news network decides they disagree with this kind of behavior and fire the anchor.

a) This is acceptable; the news network is acting within their rights and according to their principles
b) This is outrageous; people should be judged on the quality of their work and not their political beliefs

12. The principal of a private school is a member of Planned Parenthood and, off-duty, speaks out about contraception and the morning after pill. The board of the private school decides this is inappropriate given the school’s commitment to abstinence and moral education and asks the principal to stop these speaking engagements or step down from his position.

a) The school board is acting within its rights; they can insist on a principal who shares their values
b) The school board should back off; it’s none of their business what he does in his free time

[Difference] of 0 to 3: You are an Object-Level Thinker. You decide difficult cases by trying to find the solution that makes the side you like win and the side you dislike lose in that particular situation.

[Difference] of 4 to 6: You are a Meta-Level Thinker. You decide difficult cases by trying to find general principles that can be applied evenhandedly regardless of which side you like or dislike.

--Scott Alexander, "The Slate Star Codex Political Spectrum Quiz"

The Character of an Epistemic Prisoner's Dilemma

Say there are two tribes. The tribes hold fundamentally different values, but they also model the world in different terms. Each thinks members of the other tribe are mistaken, and that some of their apparent value disagreement would be resolved if the others' mistakes were corrected.

Keeping this in mind, let's think about inter-tribe cooperation and defection.

Ruling by Reference Classes, Rather Than Particulars

In the worst equilibrium, actors from each tribe evaluate political questions in favor of their own tribe, against the outgroup. In their world model, this is to a great extent for the benefit of the outgroup members as well.

But this is a shitty regime to live under when it's done back to you too, so rival tribes can sometimes come together to implement an impartial judiciary. The natural way to do this is to have a judiciary classifier rule for reference classes of situations, and to have a separate impartial classifier sort situations into reference classes.

You're locally worse off this way, but are globally much better off.

Comment by David Udell on David Udell's Shortform · 2023-03-17T22:01:33.732Z · LW · GW

Academic philosophers are better than average at evaluating object-level arguments for some claim. They don't seem to be very good at thinking about what rationalization in search implies about the arguments that come up. Compared to academic philosophers, rationalists strike me as especially appreciating filtered evidence and its significance to your world model.

If you find an argument for a claim easily, then even if that argument is strong, this (depending on some other things) implies that similarly strong arguments on the other side may turn up with not too much more work. Given that, you won't want to update dramatically in favor of the claim -- the powerful evidence to the contrary could, you infer, be unearthed without much more work. You learn something about the other side of the issue from how quickly or slowly the world yielded evidence in the other direction. If it's considered a social faux pas to give strong arguments for one side of a claim, then your prior about how hard it is to find strong arguments for that side of the claim will be doing a lot of the heavy lifting in fixing your world model. And so on, for the evidential consequences of other kinds of motivated search and rationalization.

In brief, you can do epistemically better than ignoring how much search power went into finding all the evidence. You can do better than only evaluating the object-level evidential considerations! You can take expended search into account, in order to model what evidence is likely hiding, where, behind how much search debt.

Comment by David Udell on David Udell's Shortform · 2023-03-03T01:37:01.809Z · LW · GW

Modest spoilers for planecrash (Book 9 -- null action act II).

Nex and Geb had each INT 30 by the end of their mutual war.  They didn't solve the puzzle of Azlant's IOUN stones... partially because they did not find and prioritize enough diamonds to also gain Wisdom 27.  And partially because there is more to thinkoomph than Intelligence and Wisdom and Splendour, such as Golarion's spells readily do enhance; there is a spark to inventing notions like probability theory or computation or logical decision theory from scratch, that is not directly measured by Detect Thoughts nor by tests of legible ability at using existing math.  (Keltham has slightly above-average intelligence for dath ilan, reflectivity well below average, and an ordinary amount of that spark.)

But most of all, Nex and Geb didn't solve IOUN stones because they didn't come from a culture that had already developed digital computation and analog signal processing.  Or on an even deeper level - because those concepts can't really be that hard at INT 30, even if your WIS is much lower and you are missing some sparks - they didn't come from a culture which said that inventing things like that is what the Very Smart People are supposed to do with their lives, nor that Very Smart People are supposed to recheck what their society told them were the most important problems to solve.

Nex and Geb came from a culture which said that incredibly smart wizards were supposed to become all-powerful and conquer their rivals; and invent new signature spells that would be named after them forever after; and build mighty wizard-towers, and raise armies, and stabilize impressively large demiplanes; and fight minor gods, and surpass them; and not, particularly, question society's priorities for wizards.  Nobody ever told Nex or Geb that it was their responsibility to be smarter than the society they grew up in, or use their intelligence better than common wisdom said to use it.  They were not prompted to look in the direction of analog signal processing; and, more importantly in the end, were not prompted to meta-look around for better directions to look, or taught any eld-honed art of meta-looking.

--Eliezer, planecrash

Comment by David Udell on David Udell's Shortform · 2023-03-02T00:39:38.638Z · LW · GW

What sequence of characters could I possibly, actually type out into a computer that would appreciably reduce the probability that everything dies?

Framed like this, writing to save the world sounds impossibly hard! Almost everything written has no appreciable effect on our world's AI trajectory. I'm sure the "savior sequence" exists mathematically, but finding it is a whole different ballgame.

Comment by David Udell on David Udell's Shortform · 2023-02-13T02:07:28.807Z · LW · GW

In the beginning God created four dimensions. They were all alike and indistinguishable from one another. And then God embedded atoms of energy (photons, leptons, etc.) in the four dimensions. By virtue of their energy, these atoms moved through the four dimensions at the speed of light, the only spacetime speed. Thus, as perceived by any one of these atoms, space contracted in, and only in, the direction of that particular atom's motion. As the atoms moved at the speed of light, space contracted so much in the direction of the atom's motion that the dimension in that direction vanished. That left only three dimensions of space -- all perpendicular to the atom's direction of motion -- and the ghost of the lost fourth dimension, which makes itself felt as the current of time. Now atoms moving in different directions cannot share the same directional flow of time. Each takes on the particular current it perceives as the proper measure of time.

You measure only... as projected on your time and space dimensions.

--Lewis Carroll Epstein, Relativity Visualized (1997)

Comment by David Udell on David Udell's Shortform · 2023-02-08T23:32:28.507Z · LW · GW

Past historical experience and brainstorming about human social orders probably barely scratches the possibility space. If the CEV were to weigh in on possible posthuman social orders,[1] optimizing in part for how cool that social order is, I'd bet what it describes blows what we've seen out of the water in terms of cool factor.

  1. ^

    (Presumably posthumans will end up reflectively endorsing interactions with one another of some description.)

Comment by David Udell on David Udell's Shortform · 2023-02-08T23:24:00.988Z · LW · GW

Don't translate your values into just a loss function. Rather, translate them into a loss function and all the rest of a training story. Use all the tools at your disposal in your impossible task; don't tie one hand behind your back by assuming the loss function is your only lever over the AGI's learned values.

Comment by David Udell on My Model Of EA Burnout · 2023-01-28T03:51:53.113Z · LW · GW

This post crystallized some thoughts that have been floating in my head, inchoate, since I read Zvi's stuff on slack and Valentine's "Here's the Exit."

Part of the reason that it's so hard to update on these 'creative slack' ideas is that we make deals among our momentary mindsets to work hard when it's work-time. (And when it's literally the end of the world at stake, it's always work-time.) "Being lazy" is our label for someone who hasn't established that internal deal between their varying mindsets, and so is flighty and hasn't precommitted to getting stuff done even if they currently aren't excited about work.

Once you've installed that internal flinch away from not working/precommitment to work anyways, though, it's hard to accept that hard work is ever a mistake, because that seems like your current mindset trying to rationalize its way out of cooperating today!

I think I finally got past this flinch/got out of running that one particular internal status race, thanks to this and the aforementioned posts.

Comment by David Udell on David Udell's Shortform · 2023-01-26T05:43:23.459Z · LW · GW

A model I picked up from Eric Schwitzgebel.

The humanities used to be highest-status in the intellectual world!

But then, scientists quite visibly exploded fission weapons and put someone on the moon. It's easy to coordinate to ignore some unwelcome evidence, but not evidence that blatant. So, begrudgingly, science has been steadily accorded more and more status, from the postwar period on.

Comment by David Udell on David Udell's Shortform · 2023-01-26T05:34:50.668Z · LW · GW

"Calling babble and prune the True Name of text generation is like calling bogosort the True Name of search."

Comment by David Udell on David Udell's Shortform · 2023-01-21T01:32:10.626Z · LW · GW

In the 1920s when  and CL began, logicians did not automatically think of functions as sets of ordered pairs, with domain and range given, as mathematicians are trained to do today. Throughout mathematical history, right through to computer science, there has run another concept of function, less precise at first but strongly influential always; that of a function as an operation-process (in some sense) which may be applied to certain objects to produce other objects. Such a process can be defined by giving a set of rules describing how it acts on an arbitrary input-object. (The rules need not produce an output for every input.) A simple example is the permutation-operation  defined by


Nowadays one would think of a computer program, though the 'operation-process' concept was not originally intended to have the finiteness and effectiveness limitations that are involved with computation.

Perhaps the most important difference between operators and functions is that an operator may be defined by describing its action without defining the set of inputs for which this action produces results, i.e., without defining its domain. In a sense, operators are 'partial functions.'

A second important difference is that some operators have no restriction on their domain; they accept any inputs, including themselves. The simplest example is , which is defined by the operation of doing nothing at all. If this is accepted as a well-defined concept, then surely the operation of doing nothing can be applied to it. We simply get


Of course, it is not claimed that every operator is self-applicable; this would lead to contradictions. But the self-applicability of at least such simple operators as , and  seems very reasonable.

The operator concept can be modelled in standard ZF set theory if, roughly speaking, we interpret operators as infinite sequences of functions (satisfying certain conditions), instead of as single functions. This was discovered by Dana Scott in 1969 (pp. 45-6).

--Hindley and Seldin, Lambda-Calculus and Combinators (2008)

Comment by David Udell on Consequentialists: One-Way Pattern Traps · 2023-01-19T04:16:53.689Z · LW · GW

Given a transformer model, it's probably possible to find a reasonably concise energy function (probably of a similar OOM of complexity as the model weights themselves) whose minimization corresponds to executing forwards passes of the transformer. However, this [highly compressive] energy function wouldn't tell you much about what the personas simulated by the model "want" or how agentic they were, since the energy function is expressed in the ontology of model weights and activations, not an agent's beliefs / goals. [This has] the type signature of a utility function, that meaningfully compress a system's behavior, without... telling you much about the long term behavior / goals of the system.

When I think about the powerful AGI taking over the lightcone, I can definitely see it efficiently juggling familiar resources between nodes in space. E.g., it'll want to build power collectors of some description around the sun and mine the asteroids. I can understand that AGI as a resource inventory whose feelers grow its resource stocks with time. The AGI's neural network can also be accurately modeled as an energy function being minimized, expressed in terms of neural network stuffs instead of in familiar resources.

I wouldn't be terribly surprised if something similar was true for human brains, too. I can model people as steadily accruing social-world resources, like prestige, propriety, money, attractiveness, etc. There's perhaps also some tidy neural theory, expressed in an alien mathematical ontology, that very compactly predicts an arbitrary actual brain's motor outputs.

I guess I'm used to modeling people as coherent behavioral profiles with respect to social resources because social resources are an abstraction I have. (I don't know what given social behaviors would imply about neural outputs expressed in wholly neural ontology, if anything.) If I had some other suite of alien mathematical abstractions that gave me precognitive foresight into people's future motor outputs, and I could practically operate those alien abstractions, I'd probably switch over to entirely modeling people that way instead. Until I have those precog math abstractions, I have to keep modeling people in the ontology of familiar features, i.e. social resources.

It seems totally plausible to me that an outwardly sclerotic DMV that never goes out of its way to help the public could still have tight internal coordination and close ranks to thwart hostile management, and that an outwardly helpful / flexible DMV that focuses on the spirit of the law might fail to do so.

I completely agree, or at least that isn't a crux for me here. I'm confused about the extent to which I should draw inferences about AGI behavior from my observations of large human organizations. I feel like that's the wrong thing to analogize to. Like, if you can find ~a human brain via gradient descent, you can find a different better nearby brain more readily than you can find a giant organization of brains satisfying some behavioral criteria. Epistemic status: not very confident. Anyways, the analogy between AGI and organizations seems weak, and I didn't intend for it to be a more-than-illustrative, load-bearing part of the post's argument.

Similarly, do top politicians seem to have particularly "consequentialist" cognitive styles? If consequentialist thinking and power accumulation actually do go together hand in hand, then we should expect top politicians to be disproportionately very consequentialist. But if I think about specific cognitive motions that I associate with the EY-ish notion of "consequentialism", I don't think top politicians are particularly inclined towards such motions. E.g., how many of them "actively work on becoming ever more consequentialist"? Do they seem particularly good at having coherent internal beliefs? Or a wide range of competence in many different (seemingly) unrelated domains?

I think the model takes a hit here, yeah... though I don't wholly trust my own judgement of top politicians, for politics-is-the-mindkiller reasons. I'm guessing there's an elephant in the brain thing here where, like in flirting, you have strong ancestral pressures to self-deceive and/or maintain social ambiguity about your motives. I (maybe) declare, as an ex post facto epicycle, that human tribal politics is weird (like human flirting and a handful of other ancestral-signaling-heavy domains).

Business leaders do strike me as disproportionately interested in outright self-improvement and in explicitly improving the efficiency of their organization and their own work lives. Excepting the above epicycles, I also expect business leaders to have notably-better-than-average internal maps of the local territory and better-than-average competence in many domains. Obviously, there are some significant confounds there, but still.

Comment by David Udell on The Telephone Theorem: Information At A Distance Is Mediated By Deterministic Constraints · 2023-01-15T05:07:39.815Z · LW · GW

This is a great theorem that's stuck around in my head this last year! It's presented clearly and engagingly, but more importantly, the ideas in this piece are suggestive of a broader agent foundations research direction. If you wanted to intimate that research direction with a single short post that additionally demonstrates something theoretically interesting in its own right, this might be the post you'd share.

Comment by David Udell on Your Dog is Even Smarter Than You Think · 2023-01-15T05:00:15.497Z · LW · GW

This post has successfully stuck around in my mind for two years now! In particular, it's made me explicitly aware of the possibility of flinching away from observations because they're normie-tribe-coded.

I think I deny the evidence on most of the cases of dogs generating complex English claims. But it was epistemically healthy for that model anomaly to be rubbed in my face, rather than filter-bubbled away plus flinched away from and ignored.

Comment by David Udell on Coase's "Nature of the Firm" on Polyamory · 2023-01-15T04:52:08.773Z · LW · GW

This is a fantastic piece of economic reasoning applied to a not-flagged-as-economics puzzle! As the post says, a lot of its content is floating out there on the internet somewhere: the draw here is putting all those scattered insights together under their common theory of the firm and transaction costs framework. In doing so, it explicitly hooked up two parts of my world model that had previously remained separate, because they weren't obviously connected.

Comment by David Udell on David Udell's Shortform · 2023-01-10T00:13:36.838Z · LW · GW

Complex analysis is the study of functions of a complex variable, i.e., functions  where  and  lie in . Complex analysis is the good twin and real analysis the evil one: beautiful formulas and elegant theorems seem to blossom spontaneously in the complex domain, while toil and pathology rule in the reals. Nevertheless, complex analysis relies more on real analysis than the other way around.

--Pugh, Real Mathematical Analysis (p. 28)

Comment by David Udell on David Udell's Shortform · 2023-01-09T22:14:40.491Z · LW · GW

One important idea I've picked up from reading Zvi is that, in communication, it's important to buy out the status cost imposed by your claims.

If you're fielding a theory of the world that, as a side effect, dunks on your interlocutor and diminishes their social status, you can work to get that person to think in terms of Bayesian epistemology and not decision theory if you make sure you aren't hurting their social image. You have to put in the unreasonable-feeling work of framing all your claims such that their social status is preserved or fairly increased. If you don't, your counterparty isn't going to treat that as a good-faith interaction, and they're going to stay in a bad faith, "arguments as soldiers" conversational mode instead.

When a community puts in the hard work of cooperating in maintaining a strong epistemic commons, you don't have to put as much effort in your communications protocol if you want to get a model across. When a community's collective epistemology is degraded, you have to do this work, always packaging your points just so, as the price of communicating.

Comment by David Udell on Linear Algebra Done Right, Axler · 2023-01-04T01:27:20.630Z · LW · GW

Thanks -- right on both counts! Post amended.

Comment by David Udell on David Udell's Shortform · 2022-12-23T23:03:18.744Z · LW · GW

An Inconsistent Simulated World

I regret to inform you, you are an em inside an inconsistent simulated world. By this, I mean: your world is a slapdash thing put together out of off-the-shelf assets in the near future (presumably right before a singularity eats that simulator Earth).

Your world doesn't bother emulating far-away events in great detail, and indeed, may be messing up even things you can closely observe. Your simulators are probably not tampering with your thoughts, though even that is something worth considering carefully.

What are the flaws you can notice inside your simulated world?

Physics is internally consistent. But your model of the physical world almost certainly isn't! And your world-model doesn't feel like just a model... it's instead just how the world is. What inconsistencies -- there's at least one -- can you see in the world you live in? (If you lived in an inconsistent simulated world, would you notice?)

Comment by David Udell on David Udell's Shortform · 2022-12-08T23:08:32.218Z · LW · GW

When another article of equal argumentative caliber could have just as easily been written for the negation of a claim, that writeup is no evidence for its claim.

Comment by David Udell on David Udell's Shortform · 2022-12-01T21:05:06.813Z · LW · GW

Switching costs between different kinds of work can be significant. Give yourself permission to focus entirely on one kind of work per Schelling unit of time (per day), if that would help. Don't spend cognitive cycles feeling guilty about letting some projects sit on the backburner; the point is to get where you're going as quickly as possible, not to look like you're juggling a lot of projects at once.

This can be hard, because there's a conventional social expectation that you'll juggle a lot of projects simultaneously, maybe because that's more legible to your peers and managers. If you have something to protect, though, keep your eye squarely on the ball and optimize for EV, not directly for legible appearances.

Comment by David Udell on David Udell's Shortform · 2022-12-01T20:58:26.438Z · LW · GW

A multiagent Extrapolated Volitionist institution is something that computes and optimizes for a Convergent Extrapolated Volition, if a CEV exists.

Really, though, the above Extrapolated Volitionist institutions do take other people into consideration. They either give everyone the Schelling weight of one vote in a moral parliament, or they take into consideration the epistemic credibility of other bettors as evinced by their staked wealth, or other things like that.

Sometimes the relevant interpersonal parameters can be varied, and the institutional designs don't weigh in on that question. The ideological emphasis is squarely on individual considered preferences -- that is the core insight of the outlook. "Have everyone get strictly better outcomes by their lights, probably in ways that surprise them but would be endorsed by them after reflection and/or study."

Comment by David Udell on David Udell's Shortform · 2022-12-01T05:25:49.849Z · LW · GW

Because your utility function is your utility function, the one true political ideology is clearly Extrapolated Volitionism.

Extrapolated Volitionist institutions are all characteristically "meta": they take as input what you currently want and then optimize for the outcomes a more epistemically idealized you would want, after more reflection and/or study.

Institutions that merely optimize for what you currently want the way you would with an idealized world-model are old hat by comparison!

Comment by David Udell on David Udell's Shortform · 2022-11-29T21:53:57.317Z · LW · GW

Stress and time-to-burnout are resources to be juggled, like any other.

Comment by David Udell on David Udell's Shortform · 2022-11-28T22:43:52.123Z · LW · GW

“What is the world trying to tell you?”

I've found that this prompt helps me think clearly about the evidence shed by the generator of my observations.

Comment by David Udell on David Udell's Shortform · 2022-11-22T05:08:34.024Z · LW · GW

As Gauss stressed long ago, any kind of singular mathematics acquires a meaning only as a limiting form of some kind of well-behaved mathematics, and it is ambiguous until we specify exactly what limiting process we propose to use. In this sense, singular mathematics has necessarily a kind of anthropomorphic character; the question is not what is it, but rather how shall we define it so that it is in some way useful to us?

--E. T. Jaynes, Probability Theory (p. 108)

Comment by David Udell on David Udell's Shortform · 2022-11-22T04:13:35.345Z · LW · GW

Bogus nondifferentiable functions

The case most often cited as an example of a nondifferentiable function is derived from a sequence , each of which is a string of isosceles right triangles whose hypotenuses lie on the real axis and have length . As , the triangles shrink to zero size. For any finite , the slope of  is  almost everywhere. Then what happens as ? The limit  is often cited carelessly as a nondifferentiable function. Now it is clear that the limit of the derivative, , does not exist; but it is the derivative of the limit that is in question here, , and this is certainly differentiable. Any number of such sequences  with discontinuous slope on a finer and finer scale may be defined. The error of calling the resulting limit  nondifferentiable, on the grounds that the limit of the derivative does not exist, is common in the literature. In many cases, the limit of such a sequence of bad functions is actually a well-behaved function (although awkwardly defined), and we have no reason to exclude it from our system.

Lebesgue defended himself against his critics thus: ‘If one wished always to limit himself to the consideration of well-behaved functions, it would be necessary to renounce the solution of many problems which were proposed long ago and in simple terms.’ The present writer is unable to cite any specific problem which was thus solved; but we can borrow Lebesgue’s argument to defend our own position.

To reject limits of sequences of good functions is to renounce the solution of many current real problems. Those limits can and do serve many useful purposes, which much current mathematical education and practice still tries to stamp out. Indeed, the refusal to admit delta-functions as legitimate mathematical objects has led mathematicians into error...

But the definition of a discontinuous function which is appropriate in analysis is our limit of a sequence of continuous functions. As we approach that limit, the derivative develops a higher and sharper spike. However close we are to that limit, the spike is part of the correct derivative of the function, and its contribution must be included in the exact integral...

It is astonishing that so few non-physicists have yet perceived this need to include delta-functions, but we think it only illustrates what we have observed independently; those who think of fundamentals in terms of set theory fail to see its limitations because they almost never get around to useful, substantive calculations.

So, bogus nondifferentiable functions are manufactured as limits of sequences of rows of tinier and tinier triangles, and this is accepted without complaint. Those who do this while looking askance at delta-functions are in the position of admitting limits of sequences of bad functions as legitimate mathematical objects, while refusing to admit limits of sequences of good functions! This seems to us a sick policy, for delta-functions serve many essential purposes in real, substantive calculations, but we are unable to conceive of any useful purpose that could be served by a nondifferentiable function. It seems that their only use is to provide trouble-makers with artificially contrived counter-examples to almost any sensible and useful mathematical statement one could make. Henri Poincaré (1909) noted this in his characteristically terse way:

In the old days when people invented a new function they had some useful purpose in mind: now they invent them deliberately just to invalidate our ancestors’ reasoning, and that is all they are ever going to get out of them.

We would point out that those trouble-makers did not, after all, invalidate our ancestors’ reasoning; their pathology appeared only because they adopted, surreptitiously, a different definition of the term ‘function’ than our ancestors used. Had this been pointed out, it would have been clear that there was no need to modify our ancestors’ conclusions...

Note, therefore, that we stamp out this plague too, simply by our defining the term ‘function’ in the way appropriate to our subject. The definition of a mathematical concept that is ‘appropriate’ to some field is the one that allows its theorems to have the greatest range of validity and useful applications, without the need for a long list of exceptions, special cases, and other anomalies. In our work the term ‘function’ includes good functions and well-behaved limits of sequences of good functions; but not nondifferentiable functions. We do not deny the existence of other definitions which do include nondifferentiable functions, any more than we deny the existence of fluorescent purple hair dye in England; in both cases, we simply have no use for them.

--E. T. Jaynes, Probability Theory (2003, pp. 669-71)

It's somewhat incredible to read this while simultaneously picking up some set theory. It reminds me not to absorb what's written in the high-status textbooks entirely uncritically, and to keep in mind that there's a good amount of convention behind what's in the books.

Comment by David Udell on David Udell's Shortform · 2022-11-16T00:48:19.942Z · LW · GW

Epistemic status: politics, known mindkiller; not very serious or considered.

People seem to have a God-shaped hole in their psyche: just as people banded around religious tribal affiliations, they now, in the contemporary West, band together around political tribal affiliations. Intertribal conflict can be, at its worst, violent, on top of mindkilling. Religious persecution in the UK was one of the instigating causes of British settlers migrating to the American colonies; religious conflict in Europe generally was severe.

In the US, the 1st Amendment legally protects freedom of religion from the state. This can be modeled as a response to severe intratribal conflict; bake rules into your new state that forgo the benefits of persecuting your outgroup when you're in power, in exchange for some guarantee of not being persecuted yourself when some other tribe is in power. An extension of the spirit of the 1st Amendment to contemporary tribal conflicts would, then, protect "political-tribal freedom" from the state.

A full generalization of the Amendment would protect the "freedom of tribal affiliation and expression" from the state. For this to work, people would also have to have interpersonal best practices that mostly tolerate outgroup membership in most areas of private life, too.

Comment by David Udell on David Udell's Shortform · 2022-11-14T23:40:44.800Z · LW · GW

The explicit definition of an ordered pair  is frequently relegated to pathological set theory...

It is easy to locate the source of the mistrust and suspicion that many mathematicians feel toward the explicit definition of ordered pair given above. The trouble is not that there is anything wrong or anything missing; the relevant properties of the concept we have defined are all correct (that is, in accord with the demands of intuition) and all the correct properties are present. The trouble is that the concept has some irrelevant properties that are accidental and distracting. The theorem that  if and only if  and  is the sort of thing we expect to learn about ordered pairs. The fact that , on the other hand, seems accidental; it is a freak property of the definition rather than an intrinsic property of the concept.

The charge of artificiality is true, but it is not too high a price to pay for conceptual economy. The concept of an ordered pair could have been introduced as an additional primitive, axiomatically endowed with just the right properties, no more and no less. In some theories this is done. The mathematician's choice is between having to remember a few more axioms and having to forget a few accidental facts; the choice is pretty clearly a matter of taste. Similar choices occur frequently in mathematics...

--Paul R. Halmos, Naïve Set Theory (1960, p. 24-5)

Comment by David Udell on David Udell's Shortform · 2022-11-08T14:51:38.910Z · LW · GW

Social niceties and professionalism act as a kind of 'communications handshake' in ordinary society -- maybe because they're still a credible correlate of having your act together enough to be worth considering your outputs in the first place?

Comment by David Udell on K-types vs T-types — what priors do you have? · 2022-11-04T14:05:03.715Z · LW · GW

Very cool! I have noticed that in arguments in ordinary academia people sometimes object that "that's so complicated" when I take a lot of deductive steps. I hadn't quite connected this with the idea that:

If you're confident in your assumptions ( is small), or if you're unconfident in your inferences ( is big), then you should penalise slow theories moreso than long theories, i.e. you should be a T-type.

I.e., that holding a T-type prior is adaptive when even your deductive inferences are noisy.

Also, I take it that this row of your table:

AnalogiesDifferent systems will follow the same rules.Different systems will follow the same rules.

should read "...follow different rules." in the T-types column.

Comment by David Udell on Why Aren't There More Schelling Holidays? · 2022-11-01T16:55:14.010Z · LW · GW

FWIW, this post strikes me as a very characteristically 'Hansonian' insight.

Comment by David Udell on David Udell's Shortform · 2022-10-27T14:53:44.955Z · LW · GW

Now, whatever  may assert, the fact that  can be deduced from the axioms cannot prove that there is no contradiction in them, since, if there were a contradiction,  could certainly be deduced from them!

This is the essence of the Gödel theorem, as it pertains to our problems. As noted by Fisher (1956), it shows us the intuitive reason why Gödel’s result is true. We do not suppose that any logician would accept Fisher’s simple argument as a proof of the full Gödel theorem; yet for most of us it is more convincing than Gödel’s long and complicated proof.

Now suppose that the axioms contain an inconsistency. Then the opposite of  and therefore the contradiction  can also be deduced from them:


So, if there is an inconsistency, its existence can be proved by exhibiting any proposition  and its opposite  that are both deducible from the axioms. However, in practice it may not be easy to find a  for which one sees how to prove both  and . Evidently, we could prove the consistency of a set of axioms if we could find a feasible procedure which is guaranteed to locate an inconsistency if one exists; so Gödel’s theorem seems to imply that no such procedure exists. Actually, it says only that no such procedure derivable from the axioms of the system being tested exists.

--E. T. Jaynes, Probability Theory (p. 46), logical symbolism converted to standard symbols

Comment by David Udell on Moorean Statements · 2022-10-24T23:28:09.022Z · LW · GW

I know that the humans forced to smile are not happy (and I know all the mistakes they've made while programming me, I know what they should've done instead), but I don't believe that they are not happy.

These are different senses of "happy." It should really read:

I know forcing humans to smile doesn't make them , and I know what they should've written instead to get me to optimize for  as they intended, but they are .

They're different concepts, so there's no strangeness here. The AGI knows what you meant to do, it just cares about the different thing you accidently instilled in it, and so doesn't care about what you wanted.

Comment by David Udell on David Udell's Shortform · 2022-10-13T02:29:22.887Z · LW · GW

The human brain does not start out as an efficient reasoning machine, plausible or deductive. This is something which we require years to learn, and a person who is an expert in one field of knowledge may do only rather poor plausible reasoning in another. What is happening in the brain during this learning process?

Education could be defined as the process of becoming aware of more and more propositions, and of more and more logical relationships between them. Then it seems natural to conjecture that a small child reasons on a lattice of very open structure: large parts of it are not interconnected at all. For example, the association of historical events with a time sequence is not automatic; the writer has had the experience of seeing a child, who knew about ancient Egypt and had studied pictures of the treasures from the tomb of Tutankhamen, nevertheless coming home from school with a puzzled expression and asking: ‘Was Abraham Lincoln the first person?’

It had been explained to him that the Egyptian artifacts were over 3000 years old, and that Abraham Lincoln was alive 120 years ago; but the meaning of those statements had not registered in his mind. This makes us wonder whether there may be primitive cultures in which the adults have no conception of time as something extending beyond their own lives. If so, that fact might not have been discovered by anthropologists, just because it was so unexpected that they would not have raised the question.

As learning proceeds, the lattice develops more and more points (propositions) and interconnecting lines (relations of comparability), some of which will need to be modified for consistency in the light of later knowledge. By developing a lattice with denser and denser structure, one is making his scale of plausibilities more rigidly defined.

No adult ever comes anywhere near to the degree of education where he would perceive relationships between all possible propositions, but he can approach this condition with some narrow field of specialization. Within this field, there would be a ‘quasi-universal comparability’, and his plausible reasoning within this field would approximate that given by the Laplace–Bayes theory.

A brain might develop several isolated regions where the lattice was locally quite dense; for example, one might be very well-informed about both biochemistry and musicology. Then for reasoning within each separate region, the Laplace–Bayes theory would be well-approximated, but there would still be no way of relating different regions to each other.

Then what would be the limiting case as the lattice becomes everywhere dense with truly universal comparability? Evidently, the lattice would then collapse into a line, and some unique association of all plausibilities with real numbers would then be possible. Thus, the Laplace–Bayes theory does not describe the inductive reasoning of actual human brains; it describes the ideal limiting case of an ‘infinitely educated’ brain. No wonder that we fail to see how to use it in all problems!

This speculation may easily turn out to be nothing but science fiction; yet we feel that it must contain at least a little bit of truth. As in all really fundamental questions, we must leave the final decision to the future.

--E. T. Jaynes, Probability Theory (p. 659-60)

Comment by David Udell on Don't leave your fingerprints on the future · 2022-10-11T17:33:51.239Z · LW · GW

Yeah, fair -- I dunno. I do know that an incremental improvement on simulating a bunch of people in an environment philosophizing is doing that but running an algorithm that prevents coercion, e.g.

I imagine that the complete theory of these incremental improvements (for example, also not running a bunch of moral patients for many subjective years while computing the CEV), is the final theory we're after, but I don't have it.

Comment by David Udell on Don't leave your fingerprints on the future · 2022-10-08T03:26:15.196Z · LW · GW

Then that isn't the CEV operation.

The CEV operation tries to return a fixed point of idealized value-reflection. Running immortal people forward inside of a simulated world is very much insufficiently idealized value-reflection, for the reasons you suggest, so simply simulating people interacting for a long time isn't running their CEV.

Comment by David Udell on David Udell's Shortform · 2022-10-04T01:53:40.852Z · LW · GW

Large single markets are (pretty good) consequentialist engines. Run one of these for a while, and you can expect significantly improving outcomes inside of that bloc, by the lights of the entities participating in that single market.

Comment by David Udell on David Udell's Shortform · 2022-09-29T21:57:30.821Z · LW · GW

Back and Forth

Only make choices that you would not make in reverse, if things were the other way around. Drop out of school if and only if you wouldn't enroll in school from out of the workforce. Continue school if and only if you'd switch over from work to that level of schooling.

Flitting back and forth between both possible worlds can make you less cagey about doing what's overdetermined by your world model + utility function already. It's also part of the exciting rationalist journey of acausally cooperating with your selves in other possible worlds.

Comment by David Udell on Dath Ilan's Views on Stopgap Corrigibility · 2022-09-28T00:22:43.905Z · LW · GW

I had assumed the first -- they're afraid of imperfect-values lock-in. I think it's the "not to the problem of preventing complete disaster" phrase that tipped me off here.

Comment by David Udell on Simulators · 2022-09-26T02:29:29.024Z · LW · GW

The verdict that knowledge is purely a property of configurations cannot be naively generalized from real life to GPT simulations, because “physics” and “configurations” play different roles in the two (as I’ll address in the next post). The parable of the two tests, however, literally pertains to GPT. People have a tendency to draw erroneous global conclusions about GPT from behaviors which are in fact prompt-contingent, and consequently there is a pattern of constant discoveries that GPT-3 exceeds previously measured capabilities given alternate conditions of generation[29], which shows no signs of slowing 2 years after GPT-3’s release.

Making the ontological distinction between GPT and instances of text which are propagated by it makes these discoveries unsurprising: obviously, different configurations will be differently capable and in general behave differently when animated by the laws of GPT physics. We can only test one configuration at once, and given the vast number of possible configurations that would attempt any given task, it’s unlikely we’ve found the optimal taker for any test.

Reading this was causally responsible for me undoing any updates I made after being disappointed by my playing with GPT-3. Those observations weren't more likely inside a weak-GPT world, because a strong-GPT would just as readily simulate weak-simulacra in my contexts as it would strong-simulacra in other contexts.

I think I had all the pieces to have inferred this... but some subverbal part of my cognition was illegitimately epistemically nudged by the manifest limitations of naïvely prompted GPT. That part of me, I now see, should have only been epistemically pushed around by quite serious, professional toying with GPT!

Comment by David Udell on David Udell's Shortform · 2022-09-25T18:01:24.977Z · LW · GW

Reflexively check both sides of the proposed probability of an event:

"What do I think about P(DOOM) = 81%?"


"What do I think about P(~DOOM) = 19%?"

This can often elicit feedback from parts of you that would stay silent if you only considered one way of stating the probability in question.

Comment by David Udell on David Udell's Shortform · 2022-09-25T17:29:15.072Z · LW · GW

When the sanity waterline is so low, it's easy to develop a potent sense of misanthropy.

Bryan Caplan's writing about many people hating stupid people really affected me on this point. Don't hate, or even resent, stupid people; trade with them! This is a straightforward consequence of Ricardo's comparative advantage theorem. Population averages are overrated; what matters is whether the individual interactions between agents in a population are positive-sum, not where those individual agents fall relative to the population average.

Comment by David Udell on David Udell's Shortform · 2022-09-25T16:45:54.313Z · LW · GW

I note that Eliezer thinks that corrigibility is one currently-impossible-to-instill-in-an-AGI property that humans actually have. The sum total of human psychology... consists of many such impossible-to-instill properties.

This is why we should want to accomplish one impossible thing, as our stopgap solution, rather than aiming for all the impossible things at the same time, on our first try at aligning the AGI.

Comment by David Udell on David Udell's Shortform · 2022-09-25T16:41:17.426Z · LW · GW

I hereby confer on you, reader, the shroud of epistemic shielding from predictably misleading statements. It confers irrevocable, invokable protection from having to think about predictably confused claims ever again.

Take those cognitive cycles saved, and spend them well!

Comment by David Udell on David Udell's Shortform · 2022-09-25T16:33:59.989Z · LW · GW

I've noticed that part of me likes to dedicate disproportionate cognitive cycles to the question: "If you surgically excised all powerful AI from the world, what political policies would be best to decree, by your own lights?"

The thing is, we live in a world with looming powerful AI. It's at least not consequentialist to spend a bunch of cognitive cycles honing your political views for a world we're not in. I further notice that my default justification for thinking about sans-AI politics a lot is consequentialist... so something's up here. I think some part of me has been illegitimately putting his thumb on an epistemic scale.