Posts
Comments
Your points about the difficulty of getting uniform results in this framework are interesting. My inclination is to regard this as a failure of the framework. The LI paper introduced the idea of "e.c. traders," and the goal of not being exploitable (in some sense) by such traders; these weren't well-established notions which the paper simply proven some new theorems about. So they are up for critique as much as anything else in the paper (indeed, they are the only things up for critique, since I'm not disputing that the theorems themselves follow from the premises). And if our chosen framework only lets us prove something that is too weak, while leaving the most obvious strengthening clearly out of reach, that suggests we are not looking at the problem (the philosophical problem, about how to think about logical induction) at the right level of "resolution."
As I said to Vadim earlier, I am not necessarily pessimistic about the performance of some (faster?) version of LIA with a "good" ordering for T^k. But if such a thing were to work, it would be for reasons above and beyond satisfying the LI criterion, and I wouldn't expect the LI criterion to do much work in illuminating its success. (It might serve as a sanity check -- too weak, but its negation would be bad -- but it might not end up being the kind of sanity check we want, i.e. the failures it does not permit might be just those required for good and/or fast finite-time performance. I don't necessarily think this is likely, but I won't know if it's true or not until the hypothetical work on LIA-like algorithms is done.)
Thanks for pointing out that bound. I will think about it. (BTW, if at any point you don't want to continue this back-and-forth exchange, just let me know, otherwise I will probably keep responding because I always find I have things to say)
My point regarding LIA was that the theorems in the LI paper follow from dominance over all e.c. traders, and there are countably many e.c. traders. If you stop at , all those theorems break. Of course, you will still get something out of dominating the traders, but you'd have to go back to the blackboard to figure out what it is because you can no longer get the answer from the paper. (And the theorems give you infinitely many properties, of which you'll retain only finitely many, so in a sense you will lose "almost all" of the desired results)
Surely there is something special about the order of hypotheses in SRM? Vapnik's 1998 book introduces SRM (start of Ch. 6) with a decomposition of actual risk into (1) empirical risk and (2) a term depending on VC dimension, analogous to the bias-variance decomposition of generalization error. Vapnik says SRM is for the case where sample size is small enough that term (2) is non-negligible. So already, from the start, we are trying to solve a problem where "simpler = better" appears mathematically in an expression for the error we want to minimize. Then he introduces a rule for selecting a class based on sample size , and proves an asymptotic convergence rate (Thm. 6.2) which depends on having a certain property.
It is certainly true that you could order LIA's traders by complexity (well, maybe not computably...), and I would be interested in the results. Results from some particular "good" ordering seem like the real determinants of whether LIA-like methods would be good in practice. (Since if we do not fix an ordering we can only get results that hold even for bad/"adversarial" orderings that fill early slots with nonsensical strategies)
Okay, it looks like we are roughly on the same page :)
(Yes, my name is Robert, I go by Rob)
I think it is definitely possible that the ideas involved in the construction (e.g. making a "supertrader") may lead to good practical algorithms. But it seems like this issue is orthogonal to the dominance issue. In other words, if you had shown me the constructions first and then the dominance results, I would not have been any more (or less) optimistic about the constructions after seeing those results than before.
It seems to me like there are two totally distinct ingredients here. First, we have a framework for specifying models and choosing which to use (like VC theory), including an idea about model averaging/fusion. Second, we have a trick involving enumerating a countably infinite set. The second part doesn't seem relevant to the finite time case: if I'm running LIA and you tell me that you've changed it so it only enumerates traders up to and then stops adding more, this will ruin all of the LI criterion guarantees, but it will not change any of the output I'll see in my lifetime, and the method for combining traders will remain as useful (or not) as it was before.
It's interesting to compare this to structural risk minimization, where we also have a potential infinity (nested classes of increasing VC dimension), but we are given a good ordering for them (increasing VC dimension). One could do just as well asymptotically by choosing any other order for the classes: they are countable, so you will eventually hit the best one. But in practice the VC dimension order is crucial. The arbitrary enumeration in LIA is unsatisfying to me in the same way an arbitrary ordering of the VC classes would be. (Edit: you could even say SRM succeeds because it lets you avoid counting up to infinity, by giving you an ordering under which the minimum guaranteed risk class for your sample occurs early enough that you can actually find it.)
Vadim, we seem to be talking past each other a bit.
In the linked post, I made some criticisms of some of the (informal/intuitive/motivational) reasoning that was done in the original LI paper. That paper (laudably) makes a number of explicit verbal arguments about why its definitions and results ought to be interesting and promising, and I am disagreeing with those.
In your paper, you are doing something related, but not exactly the same. It's possible that your paper is different from the original LI paper in a way that answers my criticisms (which would be very interesting to me!). Alternately, it's possible that my criticisms of the LI paper are not correct to begin with. But these are two different possibilities, and I can't tell which one you're addressing here.
The definition of an LI (=something satisfying the LI criterion) in the original paper is based on dominance, and I've argued that this is too weak to pick out a category of algorithms with good finite-time properties. Telling me "this thing is an LI," in the sense used in that paper, tells me nothing about its finite-time properties, for the reasons I explained earlier.
Even granting that, it is of course still possible that some more restrictive criterion, of the form "LI criterion + (something else)", would pick out a very promising class of algorithms.
So, are you saying
(1) I am wrong and the LI criterion is satisfying (in some way I'm not recognizing)?
(2) I'm right that the LI criterion is unsatisfying, but if we add more specifics, we get something promising? (If so, the "something else" would be very important and I would like to learn more about precisely what it is)
(3) Neither of the above?
(N.B. I am the author of the linked post, not Tarn Fletcher. He offered to link it because I was wary of logging in via Facebook. I've done so now, though.)
My central claim is that the desiderata are too weak, not that they are not pragmatically achievable. Specifically, I think that there is something fundamentally unsatisfactory about the definition of "inexploitability" (AKA "dominance" in your paper). I hadn't seen your paper before and I haven't looked at it in very much detail, but as far as I can tell, my central claim applies to it as well. (I.e. it isn't specific to formal logic as opposed to prediction)
In the tumblr post, I complained that this definition was "asymptotic," which I realize was not very clear. What I meant was that the definition captures only qualitative convergence or lack thereof, throwing away any rate information. Dominance just tells us whether the trader can or can't make an infinite profit, not how quickly the inductor "plugs the hole" which would allow the trader to make an infinite profit if left unplugged forever.
This means that any further convergence results we get from dominance considerations across multiple traders must be pointwise with respect to the individual traders. For a uniform convergence result, we would need access to convergence rate information about each of the individual traders, so we could ensure that they have all gotten within at the same time. But the dominance statements for the traders lack this information.
Consider, for instance, procedures like TradingFirm in the LI paper or eq. 31 in your paper, where a set of traders is used to construct a single trader that is "strictly better" than every trader in the set with respect to dominance. For instance, your eq. 31 reads
Although dominance carries over from each trader in the set, convergence rates and other finite-time properties do not. Concretely: at any time , all but finitely many of the from which is inheriting dominance have not yet appeared in the sum, so they are not yet affecting the behavior of .
For this reason, telling me only that an inductor dominates a set of traders does not tell me anything about the relation of that inductor to any of those traders at any given time. (After all, for any given trader and time , it might be using a procedure like the above in which the trader is not considered until some time .)
Say I'm interested in this particular set of traders because, for each trader, there is some convergence property related to it which I would like the inductor to have. (This seems quite generally true.) So for , I'm associating with the trader some property , something like "at time , the hole exploited by has been plugged to within tolerance ." What I want to conclude here is that, at time , some of these properties hold. But suppose the inductor dominates the set via a procedure like the one above. Then whenever I ask about some , I have no way of knowing whether is high enough for to have been included in the sum yet, according to the enumeration used to by the procedure. Thus I can't conclude any of the facts .
In short, this framework just isn't built for uniform convergence. If we want to get any results about how the inductor is doing "in general" at any given time, we need information that the dominance concept throws away.
Finally, about abstract asymptotic results leading to efficient practical algorithms -- yes, this happens, but it's important to think about what information beyond mere convergence is necessary for it to happen. Consider root-finding for a differentiable function from . Here's one method that converges (given some conditions): Newton's method. Here's another: enumerate the rational numbers in an arbitrary order and evaluate at one rational number per timestep. (You can approximate the root arbitrarily well with rationals, the function is continuous, blah blah.) Even though these are both convergent, there's obviously a big difference; the former is actually converging to the result in the intuitive sense of that phrase, while the latter is just trolling you by satisfying your technical criteria but not the intuitions behind them. (Cf. the enumeration-based trader constructions.)