Posts
Comments
I sampled hundreds of short context snippets from openwebtext, and measured ablation effects averaged over those sampled forward-passes. Averaged over those hundreds of passes, I didn't see any real signal in the logit effects, just a layer of noise due to the ablations.
More could definitely be done on this front. I just tried something relatively quickly that fit inside of GPU memory and wanted to report it here.
Could you hotlink the boxes on the diagrams to that, or add the resulting content as a hover text to areas, in them or something? This might be hard to do on LW: I suspect some Javascript code might be required to do this sort of thing, but perhaps a library exists for this?
My workaround was to have the dimension links laid out below each figure.
My current "print to flat .png" approach wouldn't support hyperlinks, and I don't think LW supports .svg images.
That line was indeed quite poorly phrased. It now reads:
At the bottom of the box, blue or red token boxes show the tokens most promoted (blue) and most suppressed (red) by that dimension.
That is, you're right. Interpretability data on an autoencoder dimension comes from seeing which token probabilities are most promoted and suppressed when that dimension is ablated, relative to leaving its activation value alone. That's an ablation effect sign, so the implied, plotted promotion effect signs are flipped.
The main thing I got out of reading Bostrom's Deep Utopia is a better appreciation of this "meaning of life" thing. I had never really understood what people meant by this, and always just rounded it off to people using lofty words for their given projects in life.
The book's premise is that, after the aligned singularity, the robots will not just be better at doing all your work but also be better at doing all your leisure for you. E.g., you'd never study for fun in posthuman utopia, because you could instead just ask the local benevolent god to painlessly, seamlessly put all that wisdom in your head. In that regime, studying with books and problems for the purpose of learning and accomplishment is just masochism. If you're into learning, just ask! And similarly for any psychological state you're thinking of working towards.
So, in that regime, it's effortless to get a hedonically optimal world, without any unendorsed suffering and with all the happiness anyone could want. Those things can just be put into everyone and everything's heads directly—again, by the local benevolent-god authority. The only challenging values to satisfy are those that deal with being practically useful. If you think it's important to be the first to discover a major theorem or be the individual who counterfactually helped someone, living in a posthuman utopia could make things harder in these respects, not easier. The robots can always leave you a preserve of unexplored math or unresolved evil... but this defeats the purpose of those values. It's not practical benevolence if you had to ask for the danger to be left in place; it's not a pioneering scientific discovery if the AI had to carefully avoid spoiling it for you.
Meaning is supposed to be one of these values: not a purely hedonic value, and not a value dealing only in your psychological states. A further value about the objective state of the world and your place in relation to it, wherein you do something practically significant by your lights. If that last bit can be construed as something having to do with your local patch of posthuman culture, then there can be plenty of meaning in the postinstrumental utopia! If that last bit is inextricably about your global, counterfactual practical importance by your lights, then you'll have to live with all your "localistic" values satisfied but meaning mostly absent.
It helps to see this meaning thing if you frame it alongside all the other objectivistic "stretch goal" values you might have. Above and beyond your hedonic values, you might also think it good for you and others to have objectively interesting lives, accomplished and fulfilled lives, and consumingly purposeful lives. Meaning is one of these values, where above and beyond the joyful, rich experiences of posthuman life, you also want to play a significant practical role in the world. We might or might not be able to have lots of objective meaning in the AI utopia, depending on how objectivistic meaningfulness by your lights ends up being.
Considerations that in today's world are rightly dismissed as frivolous may well, once more pressing problems have been resolved, emerge as increasingly important [remaining] lodestars... We could and should then allow ourselves to become sensitized to fainter, subtler, less tangible and less determinate moral and quasi-moral demands, aesthetic impingings, and meaning-related desirables. Such recalibration will, I believe, enable us to discern a lush normative structure in the new realm that we will find ourselves in—revealing a universe iridescent with values that are insensible to us in our current numb and stupefied condition (pp. 318-9).
I believe I and others here probably have a lot to learn from Chris, and arguments of the form "Chris confidently believes false thing X," are not really a crux for me about this.
Would you kindly explain this? Because you think some of his world-models independently throw out great predictions, even if other models of his are dead wrong?
Use your actual morals, not your model of your morals.
I agree that stronger, more nuanced interpretability techniques should tell you more. But, when you see something like, e.g.,
25132 ▁vs, ▁differently, ▁compared, ▁greater, all, ▁per
25134 ▁I, ▁My, I, ▁personally
isn't it pretty obvious what those two autoencoder neurons were each doing?
No, towards an value. is the training proxy for that, though.
Epistemic status: Half-baked thought.
Say you wanted to formalize the concepts of "inside and outside views" to some degree. You might say that your inside view is a Bayes net or joint conditional probability distribution—this mathematical object formalizes your prior.
Unlike your inside view, your outside view consists of forms of deferring to outside experts. The Bayes nets that inform their thinking are sealed away, and you can't inspect these. You can ask outside experts to explain their arguments, but there's an interaction cost associated with inspecting the experts' views. Realistically, you never fully internalize an outside expert's Bayes net.
Crucially, this means you can't update their Bayes net after conditioning on a new observation! Model outside experts as observed assertions (claiming whatever). These assertions are potentially correlated with other observations you make. But because you have little of the prior that informs those assertions, you can't update the prior when it's right (or wrong).
To the extent that it's expensive to theorize about outside experts' reasoning, the above model explains why you want to use and strengthen your inside view (instead of just deferring to outside really smart people). It's because your inside view will grow stronger with use, but your outside view won't.
(Great project!) I strongly second the RSS feed idea, if that'd be possible.
I think that many (not all) of your above examples boil down to optimizing for legibility rather than optimizing for goodness. People who hobnob instead of working quietly will get along with their bosses better than their quieter counterparts, yes. But a company of brown nosers will be less productive than a competitor company of quiet hardworking employees! So there's a cooperate/defect-dilemma here.
What that suggests, I think, is that you generally shouldn't immediately defect as hard as possible, with regard to optimizing for appearances. Play the prevailing local balance between optimizing-for-appearances and optimizing-for-outcomes that everyone around does, and try to not incrementally lower the level of org-wide cooperation. Try to eke that level of cooperation up, and set up incentives accordingly.
The ML models that now speak English, and are rapidly growing in world-transformative capability, happen to be called transformers.
Two moments of growing in mathematical maturity I remember vividly:
- Realizing that equations are claims that are therefore either true or false. Everything asserted with symbols... could just as well be asserted in English. I could start chunking up arbitrarily long and complicated equations between the equals signs, because those equals signs were just the English word "is"!
- Learning about the objects that mathematical claims are about. Going from having to look up "Wait, what's a real number again?" to knowing how , , and interrelate told me what we're making claims about. Of course, there are plenty of other mathematical objects -- but getting to know these objects taught me the general pattern.
I found it distracting that all your examples were topical, anti-red-tribe coded events. That reminded me of
In Artificial Intelligence, and particularly in the domain of nonmonotonic reasoning, there’s a standard problem: “All Quakers are pacifists. All Republicans are not pacifists. Nixon is a Quaker and a Republican. Is Nixon a pacifist?”
What on Earth was the point of choosing this as an example? To rouse the political emotions of the readers and distract them from the main question? To make Republicans feel unwelcome in courses on Artificial Intelligence and discourage them from entering the field? (And no, I am not a Republican. Or a Democrat.)
Why would anyone pick such a distracting example to illustrate nonmonotonic reasoning? Probably because the author just couldn’t resist getting in a good, solid dig at those hated Greens. It feels so good to get in a hearty punch, y’know, it’s like trying to resist a chocolate cookie.
As with chocolate cookies, not everything that feels pleasurable is good for you.
That is, I felt reading this like there were tribal-status markers mixed in with your claims that didn't have to be there, and that struck me as defecting on a stay-non-politicized discourse norm.
2. The anchor of a major news network donates lots of money to organizations fighting against gay marriage, and in his spare time he writes editorials arguing that homosexuals are weakening the moral fabric of the country. The news network decides they disagree with this kind of behavior and fire the anchor.
a) This is acceptable; the news network is acting within their rights and according to their principles
b) This is outrageous; people should be judged on the quality of their work and not their political beliefs…
12. The principal of a private school is a member of Planned Parenthood and, off-duty, speaks out about contraception and the morning after pill. The board of the private school decides this is inappropriate given the school’s commitment to abstinence and moral education and asks the principal to stop these speaking engagements or step down from his position.
a) The school board is acting within its rights; they can insist on a principal who shares their values
b) The school board should back off; it’s none of their business what he does in his free time…
[Difference] of 0 to 3: You are an Object-Level Thinker. You decide difficult cases by trying to find the solution that makes the side you like win and the side you dislike lose in that particular situation.
[Difference] of 4 to 6: You are a Meta-Level Thinker. You decide difficult cases by trying to find general principles that can be applied evenhandedly regardless of which side you like or dislike.
--Scott Alexander, "The Slate Star Codex Political Spectrum Quiz"
The Character of an Epistemic Prisoner's Dilemma
Say there are two tribes. The tribes hold fundamentally different values, but they also model the world in different terms. Each thinks members of the other tribe are mistaken, and that some of their apparent value disagreement would be resolved if the others' mistakes were corrected.
Keeping this in mind, let's think about inter-tribe cooperation and defection.
Ruling by Reference Classes, Rather Than Particulars
In the worst equilibrium, actors from each tribe evaluate political questions in favor of their own tribe, against the outgroup. In their world model, this is to a great extent for the benefit of the outgroup members as well.
But this is a shitty regime to live under when it's done back to you too, so rival tribes can sometimes come together to implement an impartial judiciary. The natural way to do this is to have a judiciary classifier rule for reference classes of situations, and to have a separate impartial classifier sort situations into reference classes.
You're locally worse off this way, but are globally much better off.
Academic philosophers are better than average at evaluating object-level arguments for some claim. They don't seem to be very good at thinking about what rationalization in search implies about the arguments that come up. Compared to academic philosophers, rationalists strike me as especially appreciating filtered evidence and its significance to your world model.
If you find an argument for a claim easily, then even if that argument is strong, this (depending on some other things) implies that similarly strong arguments on the other side may turn up with not too much more work. Given that, you won't want to update dramatically in favor of the claim -- the powerful evidence to the contrary could, you infer, be unearthed without much more work. You learn something about the other side of the issue from how quickly or slowly the world yielded evidence in the other direction. If it's considered a social faux pas to give strong arguments for one side of a claim, then your prior about how hard it is to find strong arguments for that side of the claim will be doing a lot of the heavy lifting in fixing your world model. And so on, for the evidential consequences of other kinds of motivated search and rationalization.
In brief, you can do epistemically better than ignoring how much search power went into finding all the evidence. You can do better than only evaluating the object-level evidential considerations! You can take expended search into account, in order to model what evidence is likely hiding, where, behind how much search debt.
Modest spoilers for planecrash (Book 9 -- null action act II).
Nex and Geb had each INT 30 by the end of their mutual war. They didn't solve the puzzle of Azlant's IOUN stones... partially because they did not find and prioritize enough diamonds to also gain Wisdom 27. And partially because there is more to thinkoomph than Intelligence and Wisdom and Splendour, such as Golarion's spells readily do enhance; there is a spark to inventing notions like probability theory or computation or logical decision theory from scratch, that is not directly measured by Detect Thoughts nor by tests of legible ability at using existing math. (Keltham has slightly above-average intelligence for dath ilan, reflectivity well below average, and an ordinary amount of that spark.)
But most of all, Nex and Geb didn't solve IOUN stones because they didn't come from a culture that had already developed digital computation and analog signal processing. Or on an even deeper level - because those concepts can't really be that hard at INT 30, even if your WIS is much lower and you are missing some sparks - they didn't come from a culture which said that inventing things like that is what the Very Smart People are supposed to do with their lives, nor that Very Smart People are supposed to recheck what their society told them were the most important problems to solve.
Nex and Geb came from a culture which said that incredibly smart wizards were supposed to become all-powerful and conquer their rivals; and invent new signature spells that would be named after them forever after; and build mighty wizard-towers, and raise armies, and stabilize impressively large demiplanes; and fight minor gods, and surpass them; and not, particularly, question society's priorities for wizards. Nobody ever told Nex or Geb that it was their responsibility to be smarter than the society they grew up in, or use their intelligence better than common wisdom said to use it. They were not prompted to look in the direction of analog signal processing; and, more importantly in the end, were not prompted to meta-look around for better directions to look, or taught any eld-honed art of meta-looking.
--Eliezer, planecrash
What sequence of characters could I possibly, actually type out into a computer that would appreciably reduce the probability that everything dies?
Framed like this, writing to save the world sounds impossibly hard! Almost everything written has no appreciable effect on our world's AI trajectory. I'm sure the "savior sequence" exists mathematically, but finding it is a whole different ballgame.
In the beginning God created four dimensions. They were all alike and indistinguishable from one another. And then God embedded atoms of energy (photons, leptons, etc.) in the four dimensions. By virtue of their energy, these atoms moved through the four dimensions at the speed of light, the only spacetime speed. Thus, as perceived by any one of these atoms, space contracted in, and only in, the direction of that particular atom's motion. As the atoms moved at the speed of light, space contracted so much in the direction of the atom's motion that the dimension in that direction vanished. That left only three dimensions of space -- all perpendicular to the atom's direction of motion -- and the ghost of the lost fourth dimension, which makes itself felt as the current of time. Now atoms moving in different directions cannot share the same directional flow of time. Each takes on the particular current it perceives as the proper measure of time.
…
You measure only... as projected on your time and space dimensions.
--Lewis Carroll Epstein, Relativity Visualized (1997)
Past historical experience and brainstorming about human social orders probably barely scratches the possibility space. If the CEV were to weigh in on possible posthuman social orders,[1] optimizing in part for how cool that social order is, I'd bet what it describes blows what we've seen out of the water in terms of cool factor.
- ^
(Presumably posthumans will end up reflectively endorsing interactions with one another of some description.)
Don't translate your values into just a loss function. Rather, translate them into a loss function and all the rest of a training story. Use all the tools at your disposal in your impossible task; don't tie one hand behind your back by assuming the loss function is your only lever over the AGI's learned values.
This post crystallized some thoughts that have been floating in my head, inchoate, since I read Zvi's stuff on slack and Valentine's "Here's the Exit."
Part of the reason that it's so hard to update on these 'creative slack' ideas is that we make deals among our momentary mindsets to work hard when it's work-time. (And when it's literally the end of the world at stake, it's always work-time.) "Being lazy" is our label for someone who hasn't established that internal deal between their varying mindsets, and so is flighty and hasn't precommitted to getting stuff done even if they currently aren't excited about work.
Once you've installed that internal flinch away from not working/precommitment to work anyways, though, it's hard to accept that hard work is ever a mistake, because that seems like your current mindset trying to rationalize its way out of cooperating today!
I think I finally got past this flinch/got out of running that one particular internal status race, thanks to this and the aforementioned posts.
A model I picked up from Eric Schwitzgebel.
The humanities used to be highest-status in the intellectual world!
But then, scientists quite visibly exploded fission weapons and put someone on the moon. It's easy to coordinate to ignore some unwelcome evidence, but not evidence that blatant. So, begrudgingly, science has been steadily accorded more and more status, from the postwar period on.
"Calling babble and prune the True Name of text generation is like calling bogosort the True Name of search."
In the 1920s when and CL began, logicians did not automatically think of functions as sets of ordered pairs, with domain and range given, as mathematicians are trained to do today. Throughout mathematical history, right through to computer science, there has run another concept of function, less precise at first but strongly influential always; that of a function as an operation-process (in some sense) which may be applied to certain objects to produce other objects. Such a process can be defined by giving a set of rules describing how it acts on an arbitrary input-object. (The rules need not produce an output for every input.) A simple example is the permutation-operation defined by
.
Nowadays one would think of a computer program, though the 'operation-process' concept was not originally intended to have the finiteness and effectiveness limitations that are involved with computation.
…
Perhaps the most important difference between operators and functions is that an operator may be defined by describing its action without defining the set of inputs for which this action produces results, i.e., without defining its domain. In a sense, operators are 'partial functions.'
A second important difference is that some operators have no restriction on their domain; they accept any inputs, including themselves. The simplest example is , which is defined by the operation of doing nothing at all. If this is accepted as a well-defined concept, then surely the operation of doing nothing can be applied to it. We simply get
.
…
Of course, it is not claimed that every operator is self-applicable; this would lead to contradictions. But the self-applicability of at least such simple operators as , , and seems very reasonable.
…
The operator concept can be modelled in standard ZF set theory if, roughly speaking, we interpret operators as infinite sequences of functions (satisfying certain conditions), instead of as single functions. This was discovered by Dana Scott in 1969 (pp. 45-6).
--Hindley and Seldin, Lambda-Calculus and Combinators (2008)
Given a transformer model, it's probably possible to find a reasonably concise energy function (probably of a similar OOM of complexity as the model weights themselves) whose minimization corresponds to executing forwards passes of the transformer. However, this [highly compressive] energy function wouldn't tell you much about what the personas simulated by the model "want" or how agentic they were, since the energy function is expressed in the ontology of model weights and activations, not an agent's beliefs / goals. [This has] the type signature of a utility function, that meaningfully compress a system's behavior, without... telling you much about the long term behavior / goals of the system.
When I think about the powerful AGI taking over the lightcone, I can definitely see it efficiently juggling familiar resources between nodes in space. E.g., it'll want to build power collectors of some description around the sun and mine the asteroids. I can understand that AGI as a resource inventory whose feelers grow its resource stocks with time. The AGI's neural network can also be accurately modeled as an energy function being minimized, expressed in terms of neural network stuffs instead of in familiar resources.
I wouldn't be terribly surprised if something similar was true for human brains, too. I can model people as steadily accruing social-world resources, like prestige, propriety, money, attractiveness, etc. There's perhaps also some tidy neural theory, expressed in an alien mathematical ontology, that very compactly predicts an arbitrary actual brain's motor outputs.
I guess I'm used to modeling people as coherent behavioral profiles with respect to social resources because social resources are an abstraction I have. (I don't know what given social behaviors would imply about neural outputs expressed in wholly neural ontology, if anything.) If I had some other suite of alien mathematical abstractions that gave me precognitive foresight into people's future motor outputs, and I could practically operate those alien abstractions, I'd probably switch over to entirely modeling people that way instead. Until I have those precog math abstractions, I have to keep modeling people in the ontology of familiar features, i.e. social resources.
It seems totally plausible to me that an outwardly sclerotic DMV that never goes out of its way to help the public could still have tight internal coordination and close ranks to thwart hostile management, and that an outwardly helpful / flexible DMV that focuses on the spirit of the law might fail to do so.
I completely agree, or at least that isn't a crux for me here. I'm confused about the extent to which I should draw inferences about AGI behavior from my observations of large human organizations. I feel like that's the wrong thing to analogize to. Like, if you can find ~a human brain via gradient descent, you can find a different better nearby brain more readily than you can find a giant organization of brains satisfying some behavioral criteria. Epistemic status: not very confident. Anyways, the analogy between AGI and organizations seems weak, and I didn't intend for it to be a more-than-illustrative, load-bearing part of the post's argument.
Similarly, do top politicians seem to have particularly "consequentialist" cognitive styles? If consequentialist thinking and power accumulation actually do go together hand in hand, then we should expect top politicians to be disproportionately very consequentialist. But if I think about specific cognitive motions that I associate with the EY-ish notion of "consequentialism", I don't think top politicians are particularly inclined towards such motions. E.g., how many of them "actively work on becoming ever more consequentialist"? Do they seem particularly good at having coherent internal beliefs? Or a wide range of competence in many different (seemingly) unrelated domains?
I think the model takes a hit here, yeah... though I don't wholly trust my own judgement of top politicians, for politics-is-the-mindkiller reasons. I'm guessing there's an elephant in the brain thing here where, like in flirting, you have strong ancestral pressures to self-deceive and/or maintain social ambiguity about your motives. I (maybe) declare, as an ex post facto epicycle, that human tribal politics is weird (like human flirting and a handful of other ancestral-signaling-heavy domains).
Business leaders do strike me as disproportionately interested in outright self-improvement and in explicitly improving the efficiency of their organization and their own work lives. Excepting the above epicycles, I also expect business leaders to have notably-better-than-average internal maps of the local territory and better-than-average competence in many domains. Obviously, there are some significant confounds there, but still.
This is a great theorem that's stuck around in my head this last year! It's presented clearly and engagingly, but more importantly, the ideas in this piece are suggestive of a broader agent foundations research direction. If you wanted to intimate that research direction with a single short post that additionally demonstrates something theoretically interesting in its own right, this might be the post you'd share.
This post has successfully stuck around in my mind for two years now! In particular, it's made me explicitly aware of the possibility of flinching away from observations because they're normie-tribe-coded.
I think I deny the evidence on most of the cases of dogs generating complex English claims. But it was epistemically healthy for that model anomaly to be rubbed in my face, rather than filter-bubbled away plus flinched away from and ignored.
This is a fantastic piece of economic reasoning applied to a not-flagged-as-economics puzzle! As the post says, a lot of its content is floating out there on the internet somewhere: the draw here is putting all those scattered insights together under their common theory of the firm and transaction costs framework. In doing so, it explicitly hooked up two parts of my world model that had previously remained separate, because they weren't obviously connected.
Complex analysis is the study of functions of a complex variable, i.e., functions where and lie in . Complex analysis is the good twin and real analysis the evil one: beautiful formulas and elegant theorems seem to blossom spontaneously in the complex domain, while toil and pathology rule in the reals. Nevertheless, complex analysis relies more on real analysis than the other way around.
--Pugh, Real Mathematical Analysis (p. 28)
One important idea I've picked up from reading Zvi is that, in communication, it's important to buy out the status cost imposed by your claims.
If you're fielding a theory of the world that, as a side effect, dunks on your interlocutor and diminishes their social status, you can work to get that person to think in terms of Bayesian epistemology and not decision theory if you make sure you aren't hurting their social image. You have to put in the unreasonable-feeling work of framing all your claims such that their social status is preserved or fairly increased. If you don't, your counterparty isn't going to treat that as a good-faith interaction, and they're going to stay in a bad faith, "arguments as soldiers" conversational mode instead.
When a community puts in the hard work of cooperating in maintaining a strong epistemic commons, you don't have to put as much effort in your communications protocol if you want to get a model across. When a community's collective epistemology is degraded, you have to do this work, always packaging your points just so, as the price of communicating.
Thanks -- right on both counts! Post amended.
An Inconsistent Simulated World
I regret to inform you, you are an em inside an inconsistent simulated world. By this, I mean: your world is a slapdash thing put together out of off-the-shelf assets in the near future (presumably right before a singularity eats that simulator Earth).
Your world doesn't bother emulating far-away events in great detail, and indeed, may be messing up even things you can closely observe. Your simulators are probably not tampering with your thoughts, though even that is something worth considering carefully.
What are the flaws you can notice inside your simulated world?
Physics is internally consistent. But your model of the physical world almost certainly isn't! And your world-model doesn't feel like just a model... it's instead just how the world is. What inconsistencies -- there's at least one -- can you see in the world you live in? (If you lived in an inconsistent simulated world, would you notice?)
When another article of equal argumentative caliber could have just as easily been written for the negation of a claim, that writeup is no evidence for its claim.
Switching costs between different kinds of work can be significant. Give yourself permission to focus entirely on one kind of work per Schelling unit of time (per day), if that would help. Don't spend cognitive cycles feeling guilty about letting some projects sit on the backburner; the point is to get where you're going as quickly as possible, not to look like you're juggling a lot of projects at once.
This can be hard, because there's a conventional social expectation that you'll juggle a lot of projects simultaneously, maybe because that's more legible to your peers and managers. If you have something to protect, though, keep your eye squarely on the ball and optimize for EV, not directly for legible appearances.
A multiagent Extrapolated Volitionist institution is something that computes and optimizes for a Convergent Extrapolated Volition, if a CEV exists.
Really, though, the above Extrapolated Volitionist institutions do take other people into consideration. They either give everyone the Schelling weight of one vote in a moral parliament, or they take into consideration the epistemic credibility of other bettors as evinced by their staked wealth, or other things like that.
Sometimes the relevant interpersonal parameters can be varied, and the institutional designs don't weigh in on that question. The ideological emphasis is squarely on individual considered preferences -- that is the core insight of the outlook. "Have everyone get strictly better outcomes by their lights, probably in ways that surprise them but would be endorsed by them after reflection and/or study."
Because your utility function is your utility function, the one true political ideology is clearly Extrapolated Volitionism.
Extrapolated Volitionist institutions are all characteristically "meta": they take as input what you currently want and then optimize for the outcomes a more epistemically idealized you would want, after more reflection and/or study.
Institutions that merely optimize for what you currently want the way you would with an idealized world-model are old hat by comparison!
Stress and time-to-burnout are resources to be juggled, like any other.
“What is the world trying to tell you?”
I've found that this prompt helps me think clearly about the evidence shed by the generator of my observations.
As Gauss stressed long ago, any kind of singular mathematics acquires a meaning only as a limiting form of some kind of well-behaved mathematics, and it is ambiguous until we specify exactly what limiting process we propose to use. In this sense, singular mathematics has necessarily a kind of anthropomorphic character; the question is not what is it, but rather how shall we define it so that it is in some way useful to us?
--E. T. Jaynes, Probability Theory (p. 108)
Bogus nondifferentiable functions
The case most often cited as an example of a nondifferentiable function is derived from a sequence , each of which is a string of isosceles right triangles whose hypotenuses lie on the real axis and have length . As , the triangles shrink to zero size. For any finite , the slope of is almost everywhere. Then what happens as ? The limit is often cited carelessly as a nondifferentiable function. Now it is clear that the limit of the derivative, , does not exist; but it is the derivative of the limit that is in question here, , and this is certainly differentiable. Any number of such sequences with discontinuous slope on a finer and finer scale may be defined. The error of calling the resulting limit nondifferentiable, on the grounds that the limit of the derivative does not exist, is common in the literature. In many cases, the limit of such a sequence of bad functions is actually a well-behaved function (although awkwardly defined), and we have no reason to exclude it from our system.
Lebesgue defended himself against his critics thus: ‘If one wished always to limit himself to the consideration of well-behaved functions, it would be necessary to renounce the solution of many problems which were proposed long ago and in simple terms.’ The present writer is unable to cite any specific problem which was thus solved; but we can borrow Lebesgue’s argument to defend our own position.
To reject limits of sequences of good functions is to renounce the solution of many current real problems. Those limits can and do serve many useful purposes, which much current mathematical education and practice still tries to stamp out. Indeed, the refusal to admit delta-functions as legitimate mathematical objects has led mathematicians into error...
But the definition of a discontinuous function which is appropriate in analysis is our limit of a sequence of continuous functions. As we approach that limit, the derivative develops a higher and sharper spike. However close we are to that limit, the spike is part of the correct derivative of the function, and its contribution must be included in the exact integral...
It is astonishing that so few non-physicists have yet perceived this need to include delta-functions, but we think it only illustrates what we have observed independently; those who think of fundamentals in terms of set theory fail to see its limitations because they almost never get around to useful, substantive calculations.
So, bogus nondifferentiable functions are manufactured as limits of sequences of rows of tinier and tinier triangles, and this is accepted without complaint. Those who do this while looking askance at delta-functions are in the position of admitting limits of sequences of bad functions as legitimate mathematical objects, while refusing to admit limits of sequences of good functions! This seems to us a sick policy, for delta-functions serve many essential purposes in real, substantive calculations, but we are unable to conceive of any useful purpose that could be served by a nondifferentiable function. It seems that their only use is to provide trouble-makers with artificially contrived counter-examples to almost any sensible and useful mathematical statement one could make. Henri Poincaré (1909) noted this in his characteristically terse way:
In the old days when people invented a new function they had some useful purpose in mind: now they invent them deliberately just to invalidate our ancestors’ reasoning, and that is all they are ever going to get out of them.
We would point out that those trouble-makers did not, after all, invalidate our ancestors’ reasoning; their pathology appeared only because they adopted, surreptitiously, a different definition of the term ‘function’ than our ancestors used. Had this been pointed out, it would have been clear that there was no need to modify our ancestors’ conclusions...
Note, therefore, that we stamp out this plague too, simply by our defining the term ‘function’ in the way appropriate to our subject. The definition of a mathematical concept that is ‘appropriate’ to some field is the one that allows its theorems to have the greatest range of validity and useful applications, without the need for a long list of exceptions, special cases, and other anomalies. In our work the term ‘function’ includes good functions and well-behaved limits of sequences of good functions; but not nondifferentiable functions. We do not deny the existence of other definitions which do include nondifferentiable functions, any more than we deny the existence of fluorescent purple hair dye in England; in both cases, we simply have no use for them.
--E. T. Jaynes, Probability Theory (2003, pp. 669-71)
It's somewhat incredible to read this while simultaneously picking up some set theory. It reminds me not to absorb what's written in the high-status textbooks entirely uncritically, and to keep in mind that there's a good amount of convention behind what's in the books.
Epistemic status: politics, known mindkiller; not very serious or considered.
People seem to have a God-shaped hole in their psyche: just as people banded around religious tribal affiliations, they now, in the contemporary West, band together around political tribal affiliations. Intertribal conflict can be, at its worst, violent, on top of mindkilling. Religious persecution in the UK was one of the instigating causes of British settlers migrating to the American colonies; religious conflict in Europe generally was severe.
In the US, the 1st Amendment legally protects freedom of religion from the state. This can be modeled as a response to severe intratribal conflict; bake rules into your new state that forgo the benefits of persecuting your outgroup when you're in power, in exchange for some guarantee of not being persecuted yourself when some other tribe is in power. An extension of the spirit of the 1st Amendment to contemporary tribal conflicts would, then, protect "political-tribal freedom" from the state.
A full generalization of the Amendment would protect the "freedom of tribal affiliation and expression" from the state. For this to work, people would also have to have interpersonal best practices that mostly tolerate outgroup membership in most areas of private life, too.
The explicit definition of an ordered pair is frequently relegated to pathological set theory...
It is easy to locate the source of the mistrust and suspicion that many mathematicians feel toward the explicit definition of ordered pair given above. The trouble is not that there is anything wrong or anything missing; the relevant properties of the concept we have defined are all correct (that is, in accord with the demands of intuition) and all the correct properties are present. The trouble is that the concept has some irrelevant properties that are accidental and distracting. The theorem that if and only if and is the sort of thing we expect to learn about ordered pairs. The fact that , on the other hand, seems accidental; it is a freak property of the definition rather than an intrinsic property of the concept.
The charge of artificiality is true, but it is not too high a price to pay for conceptual economy. The concept of an ordered pair could have been introduced as an additional primitive, axiomatically endowed with just the right properties, no more and no less. In some theories this is done. The mathematician's choice is between having to remember a few more axioms and having to forget a few accidental facts; the choice is pretty clearly a matter of taste. Similar choices occur frequently in mathematics...
--Paul R. Halmos, Naïve Set Theory (1960, p. 24-5)
Social niceties and professionalism act as a kind of 'communications handshake' in ordinary society -- maybe because they're still a credible correlate of having your act together enough to be worth considering your outputs in the first place?
Very cool! I have noticed that in arguments in ordinary academia people sometimes object that "that's so complicated" when I take a lot of deductive steps. I hadn't quite connected this with the idea that:
If you're confident in your assumptions ( is small), or if you're unconfident in your inferences ( is big), then you should penalise slow theories moreso than long theories, i.e. you should be a T-type.
I.e., that holding a T-type prior is adaptive when even your deductive inferences are noisy.
Also, I take it that this row of your table:
Debate | K-types | T-types |
Analogies | Different systems will follow the same rules. | Different systems will follow the same rules. |
should read "...follow different rules." in the T-types column.
FWIW, this post strikes me as a very characteristically 'Hansonian' insight.
Now, whatever may assert, the fact that can be deduced from the axioms cannot prove that there is no contradiction in them, since, if there were a contradiction, could certainly be deduced from them!
This is the essence of the Gödel theorem, as it pertains to our problems. As noted by Fisher (1956), it shows us the intuitive reason why Gödel’s result is true. We do not suppose that any logician would accept Fisher’s simple argument as a proof of the full Gödel theorem; yet for most of us it is more convincing than Gödel’s long and complicated proof.
Now suppose that the axioms contain an inconsistency. Then the opposite of and therefore the contradiction can also be deduced from them:
.
So, if there is an inconsistency, its existence can be proved by exhibiting any proposition and its opposite that are both deducible from the axioms. However, in practice it may not be easy to find a for which one sees how to prove both and . Evidently, we could prove the consistency of a set of axioms if we could find a feasible procedure which is guaranteed to locate an inconsistency if one exists; so Gödel’s theorem seems to imply that no such procedure exists. Actually, it says only that no such procedure derivable from the axioms of the system being tested exists.
--E. T. Jaynes, Probability Theory (p. 46), logical symbolism converted to standard symbols
I know that the humans forced to smile are not happy (and I know all the mistakes they've made while programming me, I know what they should've done instead), but I don't believe that they are not happy.
These are different senses of "happy." It should really read:
I know forcing humans to smile doesn't make them , and I know what they should've written instead to get me to optimize for as they intended, but they are .
They're different concepts, so there's no strangeness here. The AGI knows what you meant to do, it just cares about the different thing you accidently instilled in it, and so doesn't care about what you wanted.
The human brain does not start out as an efficient reasoning machine, plausible or deductive. This is something which we require years to learn, and a person who is an expert in one field of knowledge may do only rather poor plausible reasoning in another. What is happening in the brain during this learning process?
Education could be defined as the process of becoming aware of more and more propositions, and of more and more logical relationships between them. Then it seems natural to conjecture that a small child reasons on a lattice of very open structure: large parts of it are not interconnected at all. For example, the association of historical events with a time sequence is not automatic; the writer has had the experience of seeing a child, who knew about ancient Egypt and had studied pictures of the treasures from the tomb of Tutankhamen, nevertheless coming home from school with a puzzled expression and asking: ‘Was Abraham Lincoln the first person?’
It had been explained to him that the Egyptian artifacts were over 3000 years old, and that Abraham Lincoln was alive 120 years ago; but the meaning of those statements had not registered in his mind. This makes us wonder whether there may be primitive cultures in which the adults have no conception of time as something extending beyond their own lives. If so, that fact might not have been discovered by anthropologists, just because it was so unexpected that they would not have raised the question.
As learning proceeds, the lattice develops more and more points (propositions) and interconnecting lines (relations of comparability), some of which will need to be modified for consistency in the light of later knowledge. By developing a lattice with denser and denser structure, one is making his scale of plausibilities more rigidly defined.
No adult ever comes anywhere near to the degree of education where he would perceive relationships between all possible propositions, but he can approach this condition with some narrow field of specialization. Within this field, there would be a ‘quasi-universal comparability’, and his plausible reasoning within this field would approximate that given by the Laplace–Bayes theory.
A brain might develop several isolated regions where the lattice was locally quite dense; for example, one might be very well-informed about both biochemistry and musicology. Then for reasoning within each separate region, the Laplace–Bayes theory would be well-approximated, but there would still be no way of relating different regions to each other.
Then what would be the limiting case as the lattice becomes everywhere dense with truly universal comparability? Evidently, the lattice would then collapse into a line, and some unique association of all plausibilities with real numbers would then be possible. Thus, the Laplace–Bayes theory does not describe the inductive reasoning of actual human brains; it describes the ideal limiting case of an ‘infinitely educated’ brain. No wonder that we fail to see how to use it in all problems!
This speculation may easily turn out to be nothing but science fiction; yet we feel that it must contain at least a little bit of truth. As in all really fundamental questions, we must leave the final decision to the future.
--E. T. Jaynes, Probability Theory (p. 659-60)
Yeah, fair -- I dunno. I do know that an incremental improvement on simulating a bunch of people in an environment philosophizing is doing that but running an algorithm that prevents coercion, e.g.
I imagine that the complete theory of these incremental improvements (for example, also not running a bunch of moral patients for many subjective years while computing the CEV), is the final theory we're after, but I don't have it.