post by bentarm · 2011-02-13T23:50:50.941Z · LW · GW · Legacy · 47 comments

In Luke's recent post on what sort of posts we would like to see more of, one suggestion was "Open Thread: Math". This suggestion has been voted up by (at least) 12 people. Since it's going to take me less than 2 minutes to type this post, I figured I might as well just go ahead and post the thread, rather than vote up the suggestion.

So, this is an open thread on mathematics. As things stand, I have no idea what the rules should be (I don't know what the people who voted up the post suggestion expected the rules to be), but I guess the general principle should be that we have maths questions which are vaguely related to LW-type ideas, as there are plenty of more appropriate fora for general mathematical discussion already out there.

comment by Bo102010 · 2011-02-14T05:19:04.660Z · LW(p) · GW(p)

It's been mentioned here on Less Wrong before, but I'll recommend it again - Project Euler. It's a set of 300+ math problems that are to be solved by designing an algorithm to run in under a minute.

Getting into Project Euler last summer is likely the best move I've ever made to improve my programming skills. I'm not a programmer, but coding skills come in handy in lots of places, so I started working through the PE problems as means of learning Python.

Since I started I've replaced almost all my casual reading with research into algorithms and math, and I've gone from a Python novice to a fairly advanced user without it ever seeming like work. Getting the right answer makes you feel smart, which is an ego-stroking way of making you continue.

Replies from: Barry_Cotter, Emile, RobinZ, None, Zack_M_Davis
comment by Barry_Cotter · 2011-02-14T08:57:48.661Z · LW(p) · GW(p)

I've replaced almost all my casual reading with research into algorithms and math

Where did you start? Did you have a vague idea of algorithms before? If you began with introductory texts which ones would you recommend? Did you know any other programming languages before you started on Python? Could one get a job based solely on knowing Python to your level, do you think?

Replies from: Bo102010
comment by Bo102010 · 2011-02-15T14:32:23.280Z · LW(p) · GW(p)

I'd had a "Computer science for electrical engineers" course in school, which discussed data structures and algorithms from a high level (the usual sorting algorithm discussion, implementing a linked list, that kind of thing), but nothing too in-depth. I've had various experience in programming before PE.

In solving PE problems I've mostly used Wikipedia and Mathworld for research, and sometimes I'll Google for lecture notes on a relevant topic.

I've used the Python skills I've picked up from PE in my job already. I think I could function in a more programming-oriented job now, though solving math problems doesn't give you much help in hooking into existing APIs or writing web services, which are probably pretty important.

comment by Emile · 2011-02-14T17:22:14.544Z · LW(p) · GW(p)

I heard about it a few times, but finally just registered ... went through 6 problems, they're pretty neat, thanks :) My maths is getting rusty, this is a nice boost.

comment by RobinZ · 2011-02-14T15:28:05.829Z · LW(p) · GW(p)

I just did the first three - and I think my solution to problem 2 was fairly elegant!

Edit: My profile

comment by [deleted] · 2011-02-14T06:24:47.705Z · LW(p) · GW(p)

I'll have to try this!

comment by Zack_M_Davis · 2011-02-14T06:13:44.943Z · LW(p) · GW(p)

How many have you solved so far? I just got to level one the other day.

Replies from: Bo102010, ciphergoth
comment by Bo102010 · 2011-02-15T14:33:21.653Z · LW(p) · GW(p)

I started last June and am at 196 solved currently.

Replies from: Morendil
comment by Morendil · 2011-02-15T18:48:27.131Z · LW(p) · GW(p)

That's worth a bunch of upvotes. I think I got less than 100 done before losing interest. It was a great way to learn a new programming language (Haskell).

comment by Paul Crowley (ciphergoth) · 2011-02-15T13:45:14.052Z · LW(p) · GW(p)

Couldn't reach level one until the submission rate limiter had cleared...

comment by CronoDAS · 2011-02-14T00:45:26.690Z · LW(p) · GW(p)

What should you use for the "ignorance prior" for the value of a continuous random variable whose underlying distribution is unknown? (For a random variable that is both discrete and finite, you can always use the uniform distribution as your prior, but that doesn't work when the variable can take an infinite number of values.) And you can't always use an improper prior, either...

Replies from: janos, Eugine_Nier, jsalvatier
comment by janos · 2011-02-14T01:36:52.214Z · LW(p) · GW(p)

Would you be surprised if the absolute value was bigger than 3^^^3? I'm guessing yes, very much so. So that's a reason not to use an improper prior.

If there's no better information about the problem, I sortof like using crazy things like Normal(0,1)*exp(Cauchy); that way you usually get reasonable smallish numbers, but you don't become shocked by huge or tiny numbers either. And it's proper.

Replies from: CronoDAS
comment by CronoDAS · 2011-02-14T03:32:17.116Z · LW(p) · GW(p)

Let's say that you know the variable is a real number in [0,1], but nothing else...

Replies from: endoself, Sniffnoy, orthonormal
comment by endoself · 2011-02-14T05:36:38.275Z · LW(p) · GW(p)

If it's a frequency, I believe the accepted prior is (x(1-x))^(-1/2), divided by pi if you want it normalized. This is known as the Jefferys prior.

comment by Sniffnoy · 2011-02-14T04:18:59.986Z · LW(p) · GW(p)

Well if it's you know it's in [0,1], then you've got a perfectly good uniform distribution.

Replies from: roystgnr
comment by roystgnr · 2011-02-14T17:12:21.925Z · LW(p) · GW(p)

What if I know that my uncertain parameter C from a model equation C*exp(T/T0) is in a certain range... but then I wonder whether I should instead be evaluating the exact same equation as exp(T/T0 + D) (with D = ln(C), C = exp(D))? Do I use the uniform distribution on C, which will correspond to a skewed distribution on D, or do I use the uniform distribution on D which will correspond to a skewed distribution on C?

comment by orthonormal · 2011-02-15T01:00:47.607Z · LW(p) · GW(p)

There's a uniform prior that works just fine for this. (I think that EY's infinite set atheism might have left you with the notion that only discrete, finite sets can have uniform priors. This is false.)

Namely, if you take any interval in [0,1] of length p, then your prior probability of X lying in that interval equals p.

Obvious corollary: your prior probability for X equaling a particular real number is zero. No amount of evidence can let you conclude the exact value of X, just narrower and narrower intervals to which it belongs. If you're in a situation where you expect it's possible to find out that the answer is some exact real number, then you need to take a prior which accounts for this, and averages this "continuous" prior with a countable discrete prior over the exact values that it could conceivably take. (There are several different questions and objections you could raise here, but this actually works, and I'm happy to say more if you want.)

Replies from: CronoDAS
comment by CronoDAS · 2011-02-15T02:38:54.484Z · LW(p) · GW(p)

Yeah, you can have a uniform continuous distribution on a finite interval. The problem is that it's not actually uninformative: if you know nothing about x other than that x is in [0,1], you also know nothing about x^2 other than that it, too, is in [0,1] - but if you use the uniform distribution for x, then you're not using a uniform distribution for x^2... I think the Jeffreys prior was supposed to solve this problem, but I don't really understand what a Jeffreys prior is in general...

Replies from: orthonormal
comment by orthonormal · 2011-02-15T03:07:26.278Z · LW(p) · GW(p)

but if you use the uniform distribution for x, then you're not using a uniform distribution for x^2

Well, x^2 isn't an isometry, so you shouldn't expect it to leave the prior unchanged.

Let me put it this way: if Omega told you ve had a real number x between 0 and 1, and then ve told you that x^1000 was between 3/10 and 4/10, you probably should be more surprised than if ve told you that x^1000 was between 0 and 1/10. Yes, you could pick your prior to have a uniform distribution for x^1000 rather than for x, but that doesn't seem a natural choice in general.

Replies from: Dan_Moore
comment by Dan_Moore · 2011-02-16T22:34:44.003Z · LW(p) · GW(p)

I have also been wondering about when it's appropriate to use a uniform prior.

As an example (I think) of an instance where a uniform prior is not appropriate: NBA stats people calculate points per minute played for all the players. In some cases, bench players have higher points per minutes played than the starters. However, it does not follow that the bench player should be starting (ignoring defensive stats).

This is because bench players tend to enter the game at a time when they will play against the opposing team's bench. So, presuming that the defensive skills of the opponent's bench are less than the defensive skills of the opponent's starters, it's clear that there is a non-uniform level of defense maintained by the opponent during the game - i.e., bench players should have an easier time scoring than starters. So there is not a simple apples to apples comparison & conclusion based on this stat.

comment by Eugine_Nier · 2011-02-14T00:51:16.314Z · LW(p) · GW(p)

There's no general answer to that question.

comment by jsalvatier · 2011-02-14T00:55:33.904Z · LW(p) · GW(p)

I don't think there is any kind of general result for this. It depends on what you're trying to infer. For example, are you trying to find a scale parameter? Are you trying to find a shape parameter? I think the most popular approach is to find the Maximm Entropy distribution. Admittedly I don't know a lot about the math behind this.

Replies from: atucker
comment by atucker · 2011-02-14T03:55:28.590Z · LW(p) · GW(p)

Math-wise, basically you pick whatever distribution maximizes integral from negative infinity to infinity of p(x)log(p(x)) with respect to x, multiplied by -1. So -Sp(x)log(p(x))dx.

The crux of this making sense is that that value can be interpreted as the amount of information you expect to learn from hearing that x happened. Or more straightforwardly, its how much you expect to not know about a particular variable/event. If you use log base 2, its measured in the average number of yes/no questions needed to concisely learn that it happened. For an explanation of why that's true, these articles are excellent.

The reason that you want to maximize this value in the distribution is that not doing so assumes that you have information that you don't know. Say you have 5 bits of entropy in the maximum entropy distribution, and 4 in some other one. If you choose the4 bit one then you're basically making up information by thinking that you need one fewer yes/no question than you actually do.

comment by lukeprog · 2011-02-14T06:30:28.995Z · LW(p) · GW(p)

What is a good introduction to Kolmogorov complexity and Solomonoff induction?

Replies from: endoself, CronoDAS, Matt_Simpson, XiXiDu
comment by endoself · 2011-02-14T07:40:19.096Z · LW(p) · GW(p)

You might find http://singinst.org/blog/2007/06/25/solomonoff-induction/ interesting, though it uses a non-Turing complete example. I could write a post on it if there is demand.

Replies from: Matt_Simpson, atucker
comment by Matt_Simpson · 2011-02-14T17:13:54.388Z · LW(p) · GW(p)

There's demand :)

comment by atucker · 2011-02-16T02:30:38.402Z · LW(p) · GW(p)

Umm I'm also noting my demand, in case its necessary.

comment by CronoDAS · 2011-02-14T07:25:01.315Z · LW(p) · GW(p)

I figured it out from Wikipedia...

comment by Matt_Simpson · 2011-02-14T17:13:43.044Z · LW(p) · GW(p)

This pdf seems to have an introduction, but it's technical.

comment by XiXiDu · 2011-02-14T16:10:36.949Z · LW(p) · GW(p)

Some links here. Especially this one.

comment by DanielVarga · 2011-02-14T18:26:45.395Z · LW(p) · GW(p)

A very nice computer science puzzle I have just heard. I don't know the solution, please don't spoil it.

Input: a word. Output: the longest prefix of the word that is also a suffix. So for a given w, find the longest x such that w=xy=zx, |y|>0.

Give a deterministic algorithm with the best possible asymptotics.

Replies from: cousin_it, sketerpot
comment by cousin_it · 2011-02-14T22:46:40.360Z · LW(p) · GW(p)

This is a subroutine of the Knuth-Morris-Pratt algorithm, which works in linear time.

comment by sketerpot · 2011-02-14T19:39:09.447Z · LW(p) · GW(p)

Nice one. My answer, in rot13:

V svther gurer'f ab jnl jr pna trg guvf gb or yrff guna B(a), fvapr jr arrq gb purpx rirel punenpgre bs gur cersvk/fhssvk gb znxr fher gung vg ernyyl vf n cersvk naq fhssvk. Qrgnvyf ner yrsg nf na rkrepvfr gb gur ernqre. ;-)

Svefg, pbafgehpg n fhssvk gerr. Guvf gnxrf B(a) gvzr. Gura znex gur fgevat vgfrys nf n sbeovqqra fhssvk, juvpu gnxrf B(a) gvzr. Gura frnepu sbe nf zhpu bs gur ortvaavat bs gur fgevat nf cbffvoyr va gur fhssvk gerr, nibvqvat gur sbeovqqra fhssvk. Sbe rknzcyr, vs gur fgevat vf "nanan", gura lbh jbhyq ybbx ng gur urnq bs gur fhssvk gerr naq tb qbja gur "n" oenapu, gura qbja gur "an" oenapu, naq gura lbh jbhyq nibvq gur "an" oenapu (juvpu vf znexrq sbeovqqra), vafgrnq tbvat gb gur raq-bs-fhssvk znexre. Lbhe ybatrfg cersvk/fhssvk jbhyq gura or "nan".

Gbgny gvzr pbzcyrkvgl: B(a). Gbgny fcnpr: B(a). Fhssvk gerrf ner nznmvat.

comment by Cyan · 2011-02-14T01:45:12.722Z · LW(p) · GW(p)

I want to learn enough measure theory to deal with theoretical papers on stochastic processes e.g.(warning: pdf). I need a text with lots of homework problems. Suggestions?

Replies from: chris_elliott, orthonormal
comment by chris_elliott · 2011-02-14T03:33:36.660Z · LW(p) · GW(p)

Disclaimer: I am not a probabilist.

One possibility would be some combination of the book Probability With Martingales by David Williams, and the lecture notes of James Norris available freely online here and here, depending on your taste. As the names suggest, both of these develop measure theory with probability theory in mind. Section A of Williams's book and the first set of Norris's notes cover basic measure theory, and sections B and C of Williams's book and the second set of Norris's notes cover more advanced topics (conditional expectation, Martingales, Brownian motion). I'm only really familiar with the former, not the latter, and depending on what exactly you need/want to know you might only work through the former, and maybe not even all of that.

Norris's notes in both cases are quite a bit terser and cover a little more material. Both the book and the notes have a large number of very helpful exercises compiled at the end. I remember finding some of Norris's exercises in particular a lot of fun!

Other books I'm aware of: D. H. Fremlin has an encyclopaedic set of volumes on measure theory available for free on his website here. I've found this useful as a reference. Some people like Rudin's Real and Complex Analysis for this kind of thing, but I wouldn't recommend it for your goals. (It's very difficult to jump into that book in the middle: you really need to follow the garden path from beginning to end, and I think that'd force you through a lot of functional analysis that you could survive without). Folland probably suffers from similar problems, but I'm not really familiar with it.

Good luck!

Replies from: Cyan
comment by Cyan · 2011-02-14T22:07:58.377Z · LW(p) · GW(p)

Many thanks!

comment by orthonormal · 2011-02-15T01:04:00.809Z · LW(p) · GW(p)

Durrett's Probability: Theory and Examples has many good exercises, and works its way up to Brownian motion. From there, I don't know what to recommend. I tried a reading course with Kallenberg's Foundations of Modern Probability, which focuses heavily on those topics, but the book didn't work for me.

comment by endoself · 2011-02-14T05:41:33.835Z · LW(p) · GW(p)

Cox's theorem requires finite additivity of probability, but not countable additivity. The Solomonoff prior, however, only makes sense given countable additivity. Is there any research on how this can be reconciled? Maybe there's a self evident axiom that can be used to prove countable additivity?

Replies from: None
comment by [deleted] · 2011-02-14T06:24:07.806Z · LW(p) · GW(p)

For probability to be a measure, which it's usually defined to be, it must be countably additive.

Replies from: endoself
comment by endoself · 2011-02-14T06:30:57.673Z · LW(p) · GW(p)

Why must probability be a measure? What is wrong with an agent that uses "sums" of infinite numbers of probabilities that do not satisfy countable additivity?

Replies from: AlephNeil
comment by AlephNeil · 2011-02-14T23:36:09.554Z · LW(p) · GW(p)

I guess there are lots of things to say here.

Firstly, note that countable additivity can only fail in one direction: the probability of a countable union of disjoint events can be strictly larger than the sum of the probabilities but it can't be strictly smaller.

Now suppose for a moment that we're only interested in discrete random variables that take at most countably many values. Then it's easy to see that any additive probability measure m which assigns probabilities to each singleton set can be uniquely decomposed as the sum of a sigma-additive measure k and an additive measure m* such that m*(S) = 0 for any finite set S. Perhaps m is one of a set of hypotheses that we're trying to evaluate by applying Bayesian reasoning to a data sample (consisting of some integer values). But then whichever data we receive, m* will assign it zero probability, and therefore drop out as irrelevant. The posterior probabilities will depend only on the sigma-additive components k.

Clearly the continuous case is more complicated - I don't know whether one can make an analogous argument. (Perhaps it would depend on denying that we ever receive 'real numbers' in our data. Rather, all we get are finite amounts of information about each real.)

Perhaps you want something like a "Dutch book" argument to show that dropping countable additivity is irrational? Well, any such argument would presuppose that if you would accept each of the wagers A[n] then you must also accept the sum of the A[n] (as long as it's well-defined), but as long as you reject that presupposition, you remain perfectly self-consistent.

Replies from: endoself
comment by endoself · 2011-02-14T23:55:43.956Z · LW(p) · GW(p)

You make a lot of interesting points.

In your example with m*, what if we have a random variable that takes values equal to 1/n, for integer n? Then maybe we measure whether n is even. Wouldn't the probability of this would depend on m* as well as k?

For the case of a Dutch book, the problem is that wagers do not satisfy countable additivity. Maybe we could identify some subset of wagers that do and use this in a proof?

comment by Daniel_Burfoot · 2011-02-14T03:28:36.527Z · LW(p) · GW(p)

Does anyone keep highly up-to-date with the machine learning, computer vision, and AI literature? Are there any "hot" new topics people are excited about?

Replies from: lukeprog, None
comment by lukeprog · 2011-02-14T06:29:15.703Z · LW(p) · GW(p)

There are tons of blogs that discuss the latest exciting development in various fields of philosophy. That makes it really easy to keep up with the latest developments without paying to subscribe to all the major journals or even browsing the abstracts online to find the interesting few of each month.

Aren't there such blogs on the subjects of machine learning and AI? I would suspect so...

Replies from: Daniel_Burfoot
comment by Daniel_Burfoot · 2011-02-14T17:22:42.830Z · LW(p) · GW(p)

Yeah, there are a few. But for the most part, I think AI people don't tend to blog as much as philosophers.

comment by [deleted] · 2011-02-14T06:27:11.139Z · LW(p) · GW(p)

Intention from motion seems pretty fascinating.

Replies from: RichardKennaway
comment by RichardKennaway · 2011-02-14T13:13:37.948Z · LW(p) · GW(p)

The link doesn't work for me. Is this what you're referring to? It's paywalled though -- I can only see the first few pages. Googling "intention from motion" brings up some more accessible papers.