Hyperreals in a Nutshell
post by Yudhister Kumar (randomwalks) · 2023-10-15T14:23:58.027Z · LW · GW · 27 commentsThis is a link post for https://ykumar.org/hyperreals-in-a-nutshell/
Contents
The Sequence Construction Almost All Agreement ft. Ultrafilters Yes, This Behaves Like The Real Numbers Infinitesimals and Infinitely Large Numbers How does this tie into calculus, exactly? None 28 comments
Epistemic status: Vaguely confused and probably lacking a sufficient technical background to get all the terms right. Is very cool though, so I figured I'd write this.
And what are these Fluxions? The Velocities of evanescent Increments? And what are these same evanescent Increments? They are neither finite Quantities nor Quantities infinitely small, nor yet nothing. May we not call them the ghosts of departed quantities?
George Berkeley, The Analyst
When calculus was invented, it didn't make sense. Newton and Leibniz played fast and dirty with mathematical rigor to develop methods that arrived at the correct answers, but no one knew why. It took another one and a half centuries for Cauchy and Weierstrass develop analysis, and in the meantime people like Berkeley refused to accept the methods utilizing these "ghosts of departed quantities."
Cauchy's and Weierstrass's solution to the crisis of calculus was to define infinitesimals in terms of limits. In other words, to not describe the behavior of functions directly acting on infinitesimals, but rather to frame the the entire endeavour as studying the behaviors of certain operations in the limit, in that weird superposition of being arbitrarily close to something yet not it.
(And here I realize that math is better shown, not told)
The limit of a function at is if for any there exists some such that if
then
Essentially, the limit exists if there's some value that forces to be within of if is within of . Note that this has to hold true for all , and you choose first!
From this we get the well-known definition of the derivative:
and you can define the integral similarly.
The limit solved calculus's rigor problem. From the limit the entire field of analysis was invented and placed on solid ground, and this foundation has stood to this day.
Yet, it seems like we lose something important when we replace the idea of the "infinitesimally small" with the "arbitrarily close to." Could we actually make numbers that were infinitely small?
The Sequence Construction
Imagine some mathematical object that had all the relevant properties of the real numbers (addition, multiplication are associative and commutative, is closed, etc.) but had infinitely small and infinitely large numbers. What does this object look like?
We can take the set of all infinite sequences of real numbers as a starting point. A typical element would be
where is some infinite sequence of real numbers.
We can define addition and multiplication element-wise as:
You can verify that this is a commutative ring, which means that these operations behave nicely. Yet, being a commutative ring is not the same thing as being an ordered field, which is what we eventually want if our desired object is to have the same properties as the reals.
To get from to a field structure, we have to modify it to accommodate well-defined division. The typical way of doing this is looking at how to introduce the zero product property: i.e. ensuring that if then if either one of is .
If we let be the sequence of all zeros in then it is clear that we can have two non-zero elements multiply to get zero. If we have
and
then neither of these are the zero element, yet their product is zero.
How do we fix this? Equivalence classes!
Our problem is that there are too many distinct "zero-like" things in the ring of real numbered sequences. Intuitively, we should expect the sequence to be basically zero, and we want to find a good condensation of that allows for this.
In other words, how do we make all the sequences with "almost all" their elements as zero to be equal to zero?
Almost All Agreement ft. Ultrafilters
Taken from "five ways to say "Almost Always" and actually mean it":
A filter on an arbitrary set is a collection of subsets of that is closed under set intersections and supersets. (Note that this means that the smallest filter on is itself).
An ultrafilter is a filter which, for every , contains either or its complement. A principal ultrafilter contains a finite set.
A nonprincipal ultrafilter does not.
This turns out to be an incredibly powerful mathematical tool, and can be used to generalize the concept of "almost all" to esoteric mathematical objects that might not have well-defined or intuitive properties.
Let's say we define some nonprincipal ultrafilter on the natural numbers. This will contain all cofinite sets, and will exclude all finite sets. Now, let's take two sequences and define their agreement set to be the indices on which are identical (have the same real number in the same position).
Observe that is a set of natural numbers. If then cannot be finite, and it seems pretty obvious that almost all the elements in are the same (they only disagree at a finite number of places after all). Conversely, if this implies that , which means that disagree at almost all positions, so they probably shouldn't be equal.
Voila! We have a suitable definition of "almost all agreement": if the agreement set is contained in some arbitrary nonprincipal ultrafilter .
Let be the quotient set of under this equivalence relation (essentially, the set of all distinct equivalence classes of ). Does this satisfy the zero product property?
(Notation note: we will let denote the infinite sequence of the real number , and the equivalence class of the sequence in .)
Yes, This Behaves Like The Real Numbers
Let such that . Let's break this down element-wise: either must be zero for all As one of the ultrafilter axioms is that it must contain a set or its complement, either the index set of the zero elements in or the index set of the zero elements in will be in any nonprincipal ultrafilter on Therefore, either or is equivalent to in so satisfies the zero product property.
Therefore, division is well defined on ! Now all we need is an ordering, and luckily almost all agreement saves the day again. We can say for that if almost all elements in are greater than the elements in at the same positions (using the same ultrafilter equivalence).
So, is an ordered field!
Infinitesimals and Infinitely Large Numbers
We have the following hyperreal:
Recall that we embed the real numbers into the hyperreals by assigning every real number to the equivalence class . Now observe that is smaller than every real number embedded into the hyperreals this way.
Pick some arbitrary real number . There exists such that . There are infinitely many fractions of the form , where is a natural number greater than , so is smaller than at almost all positions, so it is smaller than .
This is an infinitesimal! This is a rigorously defined, coherently defined, infinitesimal number smaller than all real numbers! In a number system which shares all of the important properties of the real numbers! (except the Archimedean one, as we will shortly see, but that doesn't really matter).
Consider the following
By a similar argument this is larger than all possible real numbers. I encourage you to try to prove this for yourself!
(The Archimedean principle is that which guarantees that if you have any two real numbers, you can multiply the smaller by some natural number to become greater than the other. This is not true in the hyperreals. Why? (Hint: breaks this if you consider a real number.))
How does this tie into calculus, exactly?
Well, we have a coherent way of defining infinitesimals!
The short answer is that we can define the star operator (also called the standard part operator) as that which maps any hyperreal to its closest real counterpart. Then, the definition of a derivative becomes
where is some infinitesimal, and is the natural extension of to the hyperreals. More on this in a future blog post!
It also turns out the hyperreals have a bunch of really cool applications in fields far removed from analysis. Check out my expository paper on the intersection of nonstandard analysis and Ramsey theory for an example!
Yet, the biggest effect I think this will have is pedadogical. I've always found the definition of a limit kind of unintuitive, and it was specifically invented to add post hoc coherence to calculus after it had been invented and used widely. I suspect that formulating calculus via infinitesimals in introductory calculus classes would go a long way to making it more intuitive.
27 comments
Comments sorted by top scores.
comment by Richard_Kennaway · 2023-10-16T09:26:42.634Z · LW(p) · GW(p)
Yet, the biggest effect I think this will have is pedadogical. I've always found the definition of a limit kind of unintuitive, and it was specifically invented to add post hoc coherence to calculus after it had been invented and used widely. I suspect that formulating calculus via infinitesimals in introductory calculus classes would go a long way to making it more intuitive.
Different people will have different intuitions. I've always found the epsilon-delta method clear and simple, and infinitesimals made of shadows and fog when used as a basis for calculus. Every infinitesimals-first approach I have seen involves unexplained magic or papered-over cracks at some point, unexplained and papered-over because at the stage of first learning calculus the student usually doesn't know any formal logic. There's a reason that infinitesimals were only put on a sound footing a century after epsilon-delta. Mathematical logic had to be invented first.
Here the magic lies in depending on the axiom of choice to get a non-principal ultrafilter. And I believe I see a crack in the above definition of the derivative. is a function on the non-standard reals, but its derivative is defined to only take standard values, so it will be constant in the infinitesimal range around any standard real. If , then its derivative should surely be everywhere. The above definition only gives you that for standard values of .
I also think that making it more intuitive is missing the point of learning—really learning—mathematics. The idea of the slope of a curve is already intuitive. What is needed is to show the student a way of thinking about these things that does not depend on the breath of intuition to keep it aloft.
Replies from: randomwalks↑ comment by Yudhister Kumar (randomwalks) · 2023-10-18T14:38:09.669Z · LW(p) · GW(p)
Here the magic lies in depending on the axiom of choice to get a non-principal ultrafilter. And I believe I see a crack in the above definition of the derivative. f is a function on the non-standard reals, but its derivative is defined to only take standard values, so it will be constant in the infinitesimal range around any standard real. If , then its derivative should surely be everywhere. The above definition only gives you that for standard values of .
Yep, the definition is wrong. If then let denote the natural extension of this function to the hyperreals (considering behaves like this should work in most cases). Then, I think the derivative should be
W.r.t. what the derivative of should be, I imagine you can describe it similarly in terms of , which by the transfer principle should exist (which applies because of Łoś's theorem, which I don't claim to fully understand).
For the derivative then is:
comment by JBlack · 2023-10-16T05:06:44.977Z · LW(p) · GW(p)
Just in case anyone was wondering why we can't have any finite sets in the ultrafilter:
If some finite set {n1, n2, ..., n_k} is in an ultrafilter U, then either {n1, n2, ..., n_(k-1)} is in U or I \ {n1, n2, ..., n_(k-1)} is in U. In the latter case, the intersection with the original set is {n_k}, which must be in U. In the former case, you can keep repeating this until you are left with some other one-element set.
If any one-element set {n} is in U, then membership in U is just decided by whether a set contains n or not.
When you go through the equivalence construction, this means that two sequences are equivalent if and only if they agree at the n'th position, which means that all the operations are just the same as arithmetic on that position with the rest not mattering at all. So to get anything different, U really does have to be a non-principal ultrafilter.
comment by Flying Pen and Paper (flying-pen-and-paper) · 2023-11-07T17:59:46.957Z · LW(p) · GW(p)
Observe that is a set of natural numbers. If then cannot be finite, and it seems pretty obvious that almost all the elements in are the same (they only disagree at a finite number of places after all).
The bracketed remark doesn't appear to be true. Why can we not have or ? Indeed, by the definition of an ultrafilter, we must have one of them in . Also, in the post, you use for two different purposes, which makes the post slightly less clear.
comment by quiet_NaN · 2023-10-18T01:37:37.011Z · LW(p) · GW(p)
Some random thoughts.
First, it would be nice if one could go from rationals to hyperreals directly without having to define the reals in between (especially for people with limit allergies, as the reals are sometimes defined as limits of Cauchy sequences). I don't see a straightforward way to do so though, you can hardly allow people to encode their reals as sequences of rationals, otherwise the sequence would have to be equivalent to zero instead of an infinitesimal.
Also, one could split the hyperreals into equivalence classes within which the Archimedian property holds. Using the big-O adjacent notation, the reals would be , and the hyperreal called above would be . Stretching the big-O notation, one could call the equivalence class of something like . So one has a rather large zoo of these equivalence classes. This would imply that there is no Archimedian equivalence class for the smallest infinite hyperreal. If a hyperreal is infinite (that is, diverges), then is a smaller infinite hyperreal.
I am well used to there being no biggest infinity, but there being no smallest infinity would indicate that these things are neither equivalent to cardinals nor ordinals.
comment by tailcalled · 2023-10-16T15:08:44.228Z · LW(p) · GW(p)
I found Terry Tao's writing on the topic to be helpful for understanding, especially the connection between nonprincipal ultrafilters and Arrow's Impossibility Theorem.
comment by Adrià Garriga-alonso (rhaps0dy) · 2023-10-16T00:32:00.068Z · LW(p) · GW(p)
Yet, the biggest effect I think this will have is pedadogical. I've always found the definition of a limit kind of unintuitive, and it was specifically invented to add post hoc coherence to calculus after it had been invented and used widely. I suspect that formulating calculus via infinitesimals in introductory calculus classes would go a long way to making it more intuitive.
I think hyperreals are too complicated for calculus 1 and you should just talk about a non-rigorous "infinitesimal" like Newton and Leibniz did.
Replies from: randomwalks↑ comment by Yudhister Kumar (randomwalks) · 2023-10-18T14:46:28.377Z · LW(p) · GW(p)
I agree. This is what I was going for in that paragraph. If you define derivatives & integrals with infinitesimals, then you can actually do things like treating dy/dx as a fraction without partaking in the half-in half-out dance that calc 1 teachers currently have to do.
I don't think the pedagogical benefit of nonstandard analysis is to replace Analysis I courses, but rather to give a rigorous backing to doing algebra with infinitesimals ("an infinitely small thing plus a real number is the same real number, an infinitely small thing times a real number is zero"). *Improper integrals would make a lot more sense this way, IMO.
Replies from: rhaps0dy↑ comment by Adrià Garriga-alonso (rhaps0dy) · 2023-10-19T15:38:17.157Z · LW(p) · GW(p)
Thank you, that makes sense!
Indefinite integrals would make a lot more sense this way, IMO
Why so? I thought they already made sense, they're "antiderivatives", so a function such that taking its derivative gives you the original functions. Do you need anything further to define them?
(I know about the definite integral Riemann and Lebesgue definitions, but I thought indefinite integrals were much easier in comparison.
Replies from: randomwalks, randomwalks↑ comment by Yudhister Kumar (randomwalks) · 2023-10-19T17:38:51.941Z · LW(p) · GW(p)
Language mix-up. Meant improper integrals.
Now that I'm thinking about it, my memory's fuzzy on how you'd actually calculate them rigorously w/infinitesimals. Will get back to you with an example.
↑ comment by Yudhister Kumar (randomwalks) · 2023-10-22T19:50:29.411Z · LW(p) · GW(p)
comment by Adrià Garriga-alonso (rhaps0dy) · 2023-10-16T00:27:59.096Z · LW(p) · GW(p)
Voila! We have a suitable definition of "almost all agreement": if the agreement set is contained in some arbitrary nonprincipal ultrafilter .
Isn't it easier to just say "If the agreement set has a nonfinite number of elements"? Why the extra complexity?
must contain a set or its complement
Oh I see, so defining it with ultrafilters rules out situations like and where both have infinite zeros and yet their product is zero.
Replies from: JBlack↑ comment by JBlack · 2023-10-16T04:50:53.395Z · LW(p) · GW(p)
The post is wrong in saying that U contains only cofinite sets. It obviously must contain plenty of sets that are neither finite nor cofinite, because the complements of those sets are also neither finite nor cofinite. Possibly the author intended to type "contains all cofinite sets" instead.
In particular, exactly one of a or b is equivalent to zero in *R.
Which one is equivalent to zero depends upon exactly which non-principal ultrafilter you choose, as there are infinitely many non-principal ultrafilters. Unfortunately (as with many other applications of the Axiom of Choice) there is no finite way to specify which ultrafilter you mean.
Replies from: randomwalks, quiet_NaN↑ comment by Yudhister Kumar (randomwalks) · 2023-10-18T14:04:31.346Z · LW(p) · GW(p)
The post is wrong in saying that U contains only cofinite sets. It obviously must contain plenty of sets that are neither finite nor cofinite, because the complements of those sets are also neither finite nor cofinite. Possibly the author intended to type "contains all cofinite sets" instead.
Yep, this is correct! I've updated the post to reflect this.
E.g. if an ultrafilter contains the set of all even naturals, it won't contain the set of all odd naturals, neither of which are finite or cofinite.
↑ comment by quiet_NaN · 2023-10-18T00:40:59.860Z · LW(p) · GW(p)
Thanks, this is helpful to point out.
Of course, this makes all of this rather abstract. It looks to me like for almost any two hyperreals (e.g. a, b as above), the answer to "which of them is larger?" is "It depends on the ultrafilter. Also, I can not tell you if a set is part of any specific ultrafilter. But fear not, for any given ultrafilter, the hyperreals are well-ordered."
Basically for any usable theorem, one would have to prove that the result is independent of the actual ultrafilter used, which means that numbers such as a and b will probably not feature in them a lot.
I can not fault my analysis 1 professor for opting to stick to the reals (abstract as they are already are) instead.
comment by Viliam · 2023-10-15T22:45:27.908Z · LW(p) · GW(p)
I don't understand some of the words you used, so please correct me if I am wrong. What are the equivalents of the original natural numbers here? Is it like 2 = { (2, 2, 2...), and all sequences that contain an infinite number of 2's and a finite number of anything else } ?
Then we would have a partially ordered set, because 2 is neither greater than nor smaller than { (1, 3, 1, 3, 1, 3...), and its equivalents }. Is that okay?
Replies from: joseph-van-name↑ comment by Joseph Van Name (joseph-van-name) · 2023-10-16T00:16:06.104Z · LW(p) · GW(p)
Yes. We have 2=[(2,2,2,...)]. But we can compare 2 with (1,3,1,3,1,3,...) since (1,3,1,3,1,3,1,3,...)=1 (this happens when the set of all even natural numbers is in your ultrafilter) or (1,3,1,3,1,3,1,3,...)=3 (this happens when the set of all odd natural numbers is in your ultrafilter). Your partially ordered set is actually a linear ordering because whenever we have two sequences , one of the sets
is in your ultrafilter (you can think of an ultrafilter as a thing that selects one block out of every partition of the natural numbers into finitely many pieces), and if your ultrafilter contains
, then .
comment by Valdes (Cossontvaldes) · 2023-10-16T15:04:12.030Z · LW(p) · GW(p)
Thank you for this. it looks like a good first contact with hyperreals.
Two nitpicks:
- Ω=(1,2,3,ldots). --> I think you forgot a "\" here and it is messing your formatting up.
- It is not clear in the post why we use a hyperfilter, rather than just the set of all infinite sets.
↑ comment by quiet_NaN · 2023-10-17T23:04:14.778Z · LW(p) · GW(p)
Furthermore after
Conversely, if I∉U,this implies that the complement of I
the slash is used for the setminus operation. I think using \setminus there (which generates a backslash) would be a more standard notation less likely to be mistaken for quotient structures.
Replies from: randomwalks↑ comment by Yudhister Kumar (randomwalks) · 2023-10-18T14:17:25.782Z · LW(p) · GW(p)
I'm familiar with \setminus being used to denote set complements, so \not\in seemed more appropriate to me ( is not an element of ). I interpret as "the elements of not in ," which is the empty set in this case? (also the elements of are sets of naturals while the elements of are naturals, so it's unclear to me how much this makes sense)
Replies from: quiet_NaN↑ comment by quiet_NaN · 2023-10-20T12:06:09.417Z · LW(p) · GW(p)
Sorry, I was quoting the only parts of the sentence.
What I meant was that I would change
Conversely, if I∉U, this implies that N/I∈U, which means that a, b disagree at almost all positions, so they probably shouldn't be equal.
to
Conversely, if I∉U, this implies that NI∈U, which means that a, b disagree at almost all positions, so they probably shouldn't be equal.
↑ comment by Joseph Van Name (joseph-van-name) · 2023-10-16T16:13:34.494Z · LW(p) · GW(p)
I have heard of filters and ultrafilters, but I have never heard of anyone calling any sort of filter a hyperfilter. Perhaps it is because the ultrafilters are used to make fields of hyperreal numbers, so we can blame this on the terminology. Similarly, the uniform spaces where the hyperspace is complete are called supercomplete instead of hypercomplete.
But the reason why we need to use a filter instead of a collection of sets is that we need to obtain an equivalence relation.
Suppose that is an index set and is a set with for . Then let be a collection of subsets of . Define a relation on by setting if and only if . Then in order for to be an equivalence relation, must be reflexive, symmetric, and transitive. Observe that is always symmetric, and is reflexive precisely when .
Proposition: The relation is transitive if and only if is a filter.
Proof:
Suppose that is a filter. Then whenever , we have
, so since
, we conclude that as well. Therefore, .
. Suppose now that . Then let let where denotes the characteristic function. Then and . Therefore,, so by transitivity, as well, hence .
Suppose now that and . Let and set .
Observe that and . Therefore, . Thus, by transitivity, we know that . Therefore, . We conclude that is closed under taking supersets. Therefore, is a filter.
Q.E.D.
Replies from: Cossontvaldes↑ comment by Valdes (Cossontvaldes) · 2023-10-17T08:29:05.785Z · LW(p) · GW(p)
I have heard of filters and ultrafilters, but I have never heard of anyone calling any sort of filter a hyperfilter.
Oops, my bad. I re-read the post as I was typing to make sure I hadn't missed any explanation. That can sometimes cause me to type what I read instead of what I intended. I probably interverted the prefixes because they feel similar.
Thank you for the math. I am not sure everything is right with your notations in the second half, it seems to me there must be a typo either for the intersection case or the superset one. But the ideas are clear enough to let me complete the proof.
comment by Dacyn · 2023-10-16T14:51:42.288Z · LW(p) · GW(p)
The definition of a derivative seems wrong. For example, suppose that for rational but for irrational . Then is not differentiable anywhere, but according to your definition it would have a derivative of 0 everywhere (since could be an infinitesimal consisting of a sequence of only rational numbers).
Replies from: randomwalks↑ comment by Yudhister Kumar (randomwalks) · 2023-10-18T15:05:59.811Z · LW(p) · GW(p)
Have updated the definition of the derivative to specify the differences between over the hyperreals and over the reals.
I think the natural way to extend your to the hyperreals is for it to take values in an infinitesimal neighborhood surrounding rationals to 0 and all other values to 1. Using this, the derivative is in fact undefined, as
comment by Leo P. · 2023-10-16T09:46:57.949Z · LW(p) · GW(p)
First, I don't think it's a good idea to have to rely on the axiom of choice in order to be able to define continuity.
Now, from my point of view, saying that continuity is defined in terms of limits is the wrong way to look at it. Continuity is a property relative to the topology of your space. If you define continuity in terms of open sets, I find that not only the definition does make sense, but also it extends in general to any topological space. But I kind of understand that not everyone will find this intuitive.
Also, I believe that your definitions that replace the limits in terms of hyperreals have to take into account all possible infinitesimals, and thus I don't understand how it's really any different that the sequential characterization of limits. But maybe I'm missing something.
Replies from: joseph-van-name↑ comment by Joseph Van Name (joseph-van-name) · 2023-10-16T19:01:11.553Z · LW(p) · GW(p)
Let \(X,Y\) be topological spaces. Then a function \(f:X\rightarrow Y\) is continuous if and only if whenever \((x_d)_{d\in D}\) is a net that converges to the point \(x\), the net \((f(x_d))_{d\in D}\) also converges to the point \(f(x)\). This is not very hard to prove. This means that we do not have to discuss as to whether continuity should be defined in terms of open sets instead of limits because both notions apply to all topological spaces. If anything, one should define continuity in terms of closed sets instead of open sets since closed generalize slightly better to objects known as closure systems (which are like topological spaces, but we do not require the union of two closed sets to be closed). For example, the collection of all subgroups of a group is a closure system, but the complements of the subgroups of a group have little importance, so if we want the definition that makes sense in the most general context, closed sets behave better than open sets. And as a bonus, the definition of continuity works well when we are taking the inverse image of closed sets and when we are taking the closure of the image of a set.
With that being said, the good thing about continuity is that it has enough characterizations so that at least one of these characterizations is satisfying (and general topology texts should give all of these characterizations even in the context of closure systems so that the reader can obtain such satisfaction with the characterization of his or her choosing).
comment by rotatingpaguro · 2023-10-16T02:50:58.484Z · LW(p) · GW(p)
Yet, the biggest effect I think this will have is pedadogical. I've always found the definition of a limit kind of unintuitive, and it was specifically invented to add post hoc coherence to calculus after it had been invented and used widely. I suspect that formulating calculus via infinitesimals in introductory calculus classes would go a long way to making it more intuitive.
Uhm, hyperreals really look like packaged limits, I don't expect understanding them is easier than understanding limits.