Posts

On scalable oversight with weak LLMs judging strong LLMs 2024-07-08T08:59:58.523Z
Power-seeking can be probable and predictive for trained agents 2023-02-28T21:10:25.900Z

Comments

Comment by janos on Problems with learning values from observation · 2016-09-21T02:49:00.518Z · LW · GW

Is there a reason to think this problem is less amenable to being solved by complexity priors than other learning problems? / Might we build an unaligned agent competent enough to be problematic without solving problems similar to this one?

Comment by janos on Learning Mathematics in Context · 2016-01-27T17:12:34.407Z · LW · GW

What is Mathematics? by Courant and Robbins is a classic exploration that goes reasonably deep into most areas of math.

Comment by janos on Superintelligence 8: Cognitive superpowers · 2015-05-07T21:07:56.435Z · LW · GW

This makes me think of two very different things.

One is informational containment, ie how to run an AGI in a simulated environment that reveals nothing about the system it's simulated on; this is a technical challenge, and if interpreted very strictly (via algorithmic complexity arguments about how improbable our universe is likely to be in something like a Solomonoff prior), is very constraining.

The other is futurological simulation; here I think the notion of simulation is pointing at a tool, but the idea of using this tool is a very small part of the approach relative to formulating a model with the right sort of moving parts. The latter has been tried with various simple models (eg the thing in Ch 4); more work can be done, but justifying the models&priors will be difficult.

Comment by janos on Why IQ shouldn't be considered an external factor · 2015-04-28T19:58:14.623Z · LW · GW

Certainly, interventions may be available, just as for anything else; but it's not fundamentally more accessible or malleable than other things.

Comment by janos on Why IQ shouldn't be considered an external factor · 2015-04-04T19:35:22.672Z · LW · GW

I'm arguing that the fuzzy-ish definition that corresponds to our everyday experience/usage is better than the crisp one that doesn't.

Re IQ and "way of thinking", I'm arguing they both affect each other, but neither is entirely under conscious control, so it's a bit of a moot point.

Apropos the original point, under my usual circumstances (not malnourished, hanging out with smart people, reading and thinking about engaging, complex things that can be analyzed and have reasonable success measures, etc), my IQ is mostly not under my control. (Perhaps if I was more focused on measurements, nootropics, and getting enough sleep, I could increase my IQ a bit; but not very much, I think.) YMMV.

Comment by janos on Why IQ shouldn't be considered an external factor · 2015-04-04T19:03:32.574Z · LW · GW

I think what you're saying is that if we want a coherent, nontrivial definition of "under our control" then the most natural one is "everything that depends on the neural signals from your brain". But this definition, while relatively clean from the outside, doesn't correspond to what we ordinarily mean; for example, if you have a mental illness, this would suggest that "stop having that illness!!" is reasonable advice, because your illness is "under your control".

I don't know enough neuroscience to give this a physical backing, but there are certain conscious decisions or mental moves that feel like they're very much under my control, and I'd say the things under my control are just those, plus the things I can reliably affect using them. I think the correct intuitive definition of "locus of control" is "those things you can do if you want to".

Regarding causal arrows between your IQ and your thoughts, I don't think this is a well-defined query. Causality is entirely about hypothetical interventions; to say "your way of thinking affects your IQ" is just to say that if I was to change your way of thinking, I could change your IQ.

But how would I change your way of thinking? There has to be an understanding of what is being held constant, or of what range of changes we're talking about. For instance we could change your way of thinking to any that you'd likely reach from different future influences, or to any that people similar to you have had, etc. Normally what we care about is the sort of intervention that we could actually do or draw predictions from, so the first one here is what we mean. And to some degree it's true, your IQ would be changed.

From the other end, what does it mean to say your way of thinking is affected by your IQ? It means if we were to "modify your IQ" without doing anything else to affect your thinking, then your way of thinking would be altered. This seems true, though hard to pin down, since IQ is normally thought of as a scalar, rather than a whole range of phenomena like your "way of thinking". IQ is sort of an amalgam of different abilities and qualities, so if we look closely enough we'll find that IQ can't directly affect anything at all, similarly to how g can't ("it wasn't your IQ that helped you come up with those ideas, it was your working memory, and creativity, and visualization ability!"); but on the other hand if most things that increase IQ make the same sort of difference (eg to academic success) then it's fairly compact and useful to say that IQ affects those things.

Causality with fuzzy concepts is tricky.

Comment by janos on Harry Potter and the Methods of Rationality discussion thread, February 2015, chapter 113 · 2015-02-28T20:53:37.802Z · LW · GW

March 2nd isn't a Tuesday; is it Monday night or Tuesday night?

Comment by janos on How many words do we have and how many distinct concepts do we have? · 2014-12-18T18:17:52.692Z · LW · GW

If you want to discuss the nature of reality using a similar lexicon to what philosophers use, I recommend consulting the Stanford Encyclopedia of Philosophy: http://plato.stanford.edu/

Comment by janos on Link: Elon Musk wants gov't oversight for AI · 2014-10-28T16:47:43.528Z · LW · GW

Musk has joined the advisory board of FLI and CSER, which are younger sibling orgs of FHI and MIRI. He's aware of the AI xrisk community.

Comment by janos on [MIRIx Cambridge MA] Limiting resource allocation with bounded utility functions and conceptual uncertainty · 2014-10-05T16:08:03.298Z · LW · GW

Cool. Regarding bounded utility functions, I didn't mean you personally, I meant the generic you; as you can see elsewhere in the thread, some people do find it rather strange to think of modelling what you actually want as a bounded utility function.

This is where I thought you were missing the point:

Or you might say it's a suboptimal outcome because you just know that this allocation is bad, or something. Which amounts to saying that actually you know what the utility function should be and it isn't the one the analysis assumes.

Sometimes we (seem to) have stronger intuitions about allocations than about the utility function itself, and parlaying that to identify what the utility function should be is what this post is about. This may seem like a non-step to you; in that case you've already got it. Cheers! I admit it's not a difficult point. Or if you always have stronger intuitions about the utility function than about resource allocation, then maybe this is useless to you.

I agree with you that there are some situations where the sublinear allocation (and exponentially-converging utility function) seems wrong and some where it seems fine; perhaps the post should initially have said "person-enjoying-chocolate-tronium" rather than chocolate.

Comment by janos on [MIRIx Cambridge MA] Limiting resource allocation with bounded utility functions and conceptual uncertainty · 2014-10-04T18:20:21.579Z · LW · GW

Certainly given a utility function and a model, the best thing to do is what it is. The point was to show that some utility functions (eg using the exponential-decay sigmoid) have counterintuitive properties that don't match what we'd actually want.

Every response to this post that takes the utility function for granted and remarks that the optimum is the optimum is missing the point: we don't know what kind of utility function is reasonable, and we're showing evidence that some of them give optima that aren't what we'd actually want if we were turning the world into chocolate/hedonium.

If it seems strange to you to consider representing what you want by a bounded utility function, a post about that will be forthcoming.

Comment by janos on An Attempt at Logical Uncertainty · 2014-06-30T13:36:02.016Z · LW · GW

One nonconstructive (and wildly uncomputable) approach to the problem is this one: http://www.hutter1.net/publ/problogics.pdf

Comment by janos on How much to spend on a high-variance option? · 2013-01-04T02:33:55.512Z · LW · GW

I think you're making the wrong comparisons. If you buy $1 worth, you get p(win) U(jackpot) + (1-p(win)) U(-$1), which is more-or-less p(win)U(jackpot)+U(-$1); this is a good idea if p(win) U(jackpot) > -U(-$1). But under usual assumptions -U(-$2)>-2U(-$1). This adds up to normality; you shouldn't actually spend all your money. :)

Comment by janos on Ambitious utilitarians must concern themselves with death · 2012-10-25T14:48:37.127Z · LW · GW

One good negation is "the value/intrinsic utility of a life is the sum of the values/intrinsic utilities of all the moments/experiences in it, evaluated without reference to their place/context in the life story, except inasmuch as is actually part of that moment/experience".

The "actually" gets traction if people's lives follow narratives that they don't realize as they're happening, but such that certain narratives are more valuable than others; this seems true.

Comment by janos on Statisticsish Question · 2011-11-28T17:16:09.283Z · LW · GW

If your prior distribution for "yes" conditional on the number of papers is still uniform, i.e. if the number of papers has nothing to do with whether they're "yes" or not, then the rule still applies.

Comment by janos on "Friends do not let friends compute p values." · 2011-09-09T18:51:07.223Z · LW · GW

You can comfortably do Bayesian model comparison here; have priors for µcon, µamn, and µsim, and let µpat be either µamn (under hypothesis Hamn) or µsim (under hypothesis Hsim), and let Hamn and Hsim be mutually exclusive. Then integrating out µcon, µamn, and µsim, you get a marginal odds-ratio for Hamn vs Hsim, which tells you how to update.

The standard frequentist method being discussed is nested hypothesis testing, where you want to test null hypothesis H0 with alternative hypothesis H1, and H0 is supposed to be nested inside H1. For instance you could easily test null hypothesis µcon >= µamn >= µpat = µsim against µcon >= µamn >= µpat >= µsim. However, for testing non-nested hypotheses, the methodology is weaker, or at least less standard.

Comment by janos on Take heed, for it is a trap · 2011-08-15T16:58:44.479Z · LW · GW

"Alice is a banker" is a simpler statement than "Alice is a feminist banker who plays the piano.". That's why the former must be assigned greater probability than the latter.

Complexity weights apply to worlds/models, not propositions. Otherwise you might as well say:

"Alice is a banker" is a simpler statement than "Alice is a feminist, a banker, or a pianist.". That's why the former must be assigned greater probability than the latter.

Comment by janos on Looking for information on scoring calibration · 2011-04-07T22:59:53.495Z · LW · GW

tl;dr : miscalibration means mentally interpreting loglikelihood of data as being more or less than its actual loglikelihood; to infer it you need to assume/infer the Bayesian calculation that's being made/approximated. Easiest with distributions over finite sets (i.e. T/F or multiple-choice questions). Also, likelihood should be called evidence.

I wonder why I didn't respond to this when it was fresh. Anyway, I was running into this same difficulty last summer when attempting to write software to give friendly outputs (like "calibration") to a bunch of people playing the Aumann game with trivia questions.

My understanding was that evidence needs to be measured on the logscale (as the difference between prior and posterior), and miscalibration is when your mental conversion from gut feeling of evidence to the actual evidence has a multiplicative error in it. (We can pronounce this as: "the true evidence is some multiplicative factor (called the calibration parameter) times the felt evidence".) This still seems like a reasonable model, though of course different kinds of evidence are likely to have different error magnitudes, and different questions are likely to get different kinds of evidence, so if you have lots of data you can probably do better by building a model that will estimate your calibration for particular questions.

But sticking to the constant-calibration model, it's still not possible to estimate your calibration from your given confidence intervals because for that we need an idea of what your internal prior (your "prior" prior, before you've taken into account the felt evidence) is, which is hard to get any decent sense of, though you can work off of iffy assumptions, such as assuming that your prior for percentage answers from a trivia game is fitted to the set of all the percentage answers from this trivia game, and has some simple form (e.g. Beta). The Aumann game gave an advantage in this respect, because rather than comparing your probability distribution before&after thinking about the question, it makes it possible to compare the distribution before&after hearing other people's arguments&evidence; if you always speak in terms of standard probability distributions, it's not too hard to infer your calibration there.

Further "funny" issues can arise when you get down to work; for instance if your prior was a Student-t with df n1 and your posterior was a Student-t with df n2s1^2 then your calibration cannot be more than 1/(1-s1^2/s2^2) without having your posterior explode. It's tempting to say the lesson is that things break if you're becoming asymptotically less certain, which makes some intuitive sense: if your distributions are actually mixtures of finitely many different hypotheses that you're Bayesianly updating the weights of, then you will never become asymptotically less certain; in particular the Student-t scenario I described can't happen. However this is not a satisfactory conclusion because the Normal scenario (where you increase your variance by upweighting a hypothesis that gives higher variance) can easily happen.

A different resolution to the above is that the model of evidence=calibration*felt evidence is wrong, and needs an error term or two; that can give a workable result, or at least not catch fire and die.

Another thought: if your mental process is like the one two paragraphs up, where you're working with a mixture of several fixed (e.g. normal) hypotheses, and the calibration concept is applied to how you update the weights of the hypotheses, then the change in the mixture distribution (i.e. the marginal) will not follow anything like the calibration model.

So the concept is pretty tricky unless you carefully choose problems where you can reasonably model the mental inference, and in particular try to avoid "mixture-of-hypotheses"-type scenarios (unless you know in advance precisely what the hypotheses imply, which is unusual unless you construct the questions that way, .. but then I can't think of why you'd ask about the mixture instead of about the probabilities of the hypotheses themselves).

You might be okay when looking at typical multiple-choice questions; certainly you won't run into the issues with broken posteriors and invalid calibrations. Another advantage is that "the" prior (i.e. uniform) is uncontroversial, though whether the prior to use for computing calibration should be "the" prior is not obvious; but if you don't have before-and-after results from people then I guess it's the best you can do.

I just noticed that what's usually called the "likelihood" I was calling "evidence" here. This has probably been suggested by someone before, but: I've never liked the term "likelihood", and this is the best replacement for it that I know of.

Comment by janos on Inverse Speed · 2011-03-27T13:22:11.035Z · LW · GW

The way I'd try to do this problem mentally would be:

Relative to the desired concentration of 55%, each unit of 40% is missing .15 units of alcohol, and each unit of 85% has .3 extra units of alcohol. .15:.3=1:2, so to balance these out we need (amount of 40%):(amount of 85%)=2:1, i.e. we need twice as much 40% as 85%. Since we're using 1kg of 40%, this means 0.5kg of 85%.

Comment by janos on Science Journalism and How To Present Probabilities [Link] · 2011-03-15T19:41:45.282Z · LW · GW

I prefer your phrasing.

Comment by janos on Science Journalism and How To Present Probabilities [Link] · 2011-03-14T19:15:58.537Z · LW · GW

Nope: the odds ratio was (.847/(1-.847))/(.906/(1-.906)), which is indeed 57.5%, which could be rounded to 60%. If the starting probability was, say, 1%, rather than 90.6%, then translating the odds ratio statement to "60% as likely" would be legitimate, and approximately correct; probably the journalist learned to interpret odds ratios via examples like that. But when the probabilities are close to 1, it's more correct to say that the women/blacks were 60% more likely to not be referred.

Comment by janos on What Else Would I Do To Make a Living? · 2011-03-04T03:23:33.223Z · LW · GW

It's just a vanilla (MH) MCMC sampler for (some convenient family of) distributions on polytopes; hopefully like this: http://cran.r-project.org/web/packages/limSolve/vignettes/xsample.pdf , but faster. It's motivated by a model for inferring network link traffic flows from counts of in- and out-bound traffic at each node; the solution space is a polytope, and we want to take advantage of previous observations to form a better prior. But for the approach to be feasible we first need to sample.

But this is not a long-term project, I think.

Comment by janos on What Else Would I Do To Make a Living? · 2011-03-04T03:13:17.747Z · LW · GW

Looks like good stuff ... thanks for the tip.

Comment by janos on What Else Would I Do To Make a Living? · 2011-03-04T02:59:03.641Z · LW · GW

Currently I'm taking classes and working on a polytope sampler. I tend to be excited about Bayesian nonparametrics and consistent families of arbitrary-dimensional priors. I'm also excited about general-purpose MCMC-like approaches, but so far I haven't thought very hard about them.

Comment by janos on What Else Would I Do To Make a Living? · 2011-03-02T21:58:00.554Z · LW · GW

In undergrad I feared a feeling of locked-in-ness, and ditched my intention to do a PhD in math (which I think I could have done well in) partly for this reason, though it was also easier for me because I hadn't established close ties to a particular line of research, and because I had programming background. I worked a couple of years in programming, and now I'm back in school doing a PhD in stats, because I like probability spaces and because I wanted to do something more mathematical than (most) programming. I guess I picked stats over applied math partly out of the same worry about overspecialization; I think stats has a bigger wealth of better-integrated more widely applicable concepts/insights.

Comment by janos on Open Thread: Mathematics · 2011-02-14T01:36:52.214Z · LW · GW

Would you be surprised if the absolute value was bigger than 3^^^3? I'm guessing yes, very much so. So that's a reason not to use an improper prior.

If there's no better information about the problem, I sortof like using crazy things like Normal(0,1)*exp(Cauchy); that way you usually get reasonable smallish numbers, but you don't become shocked by huge or tiny numbers either. And it's proper.

Comment by janos on An Abortion Dialogue · 2011-02-12T07:46:08.009Z · LW · GW

I wasn't trying to present a principled distinction, or trying to avoid bias. What I was saying isn't something I'm going to defend. The only reason I responded to your criticism of it was that I was annoyed by the nature of your objection. However, since now I know you thought I was trying to say more than I actually was, I will freely ignore your objection.

Comment by janos on An Abortion Dialogue · 2011-02-12T06:48:23.778Z · LW · GW

Do you have an instance of "I proactively do X" where you do not class it as reactive? Do you have an instance of "I wish to avoid Y" where you do not class it as specific? I don't like conversations about definitions. I was using these words to describe a hypothetical inner experience; I don't claim that they aren't fuzzy. You seem to be pointing at the fuzziness and saying that they're meaningless; I don't see why you'd want to do that.

Comment by janos on An Abortion Dialogue · 2011-02-12T05:58:37.887Z · LW · GW

It seems to me that we mean different things by the words "reactive" (as opposed to proactive) and "specific". A weak attempt at a reductio: I proactively do X to avoid facing Y; I am thus reacting to my desire to avoid facing Y. And is Y general or specific? Y is the specific Y that I do X to avoid facing.

Comment by janos on An Abortion Dialogue · 2011-02-12T05:38:43.344Z · LW · GW

Ah, yes indeedy true. I guess I was thinking of abstinence. So wrong distinction. More likely, then: abortion is done to a specific embryo who is thereby prevented from being, and it's done reactively; there's no question that when you have an abortion it's about deciding to kill this particular embryo. Contraceptive use on the other hand is nonspecific and proactive; it doesn't feel like "I discard these reproductive cells which would have become a person!", it feels like exerting prudent control over your life.

Comment by janos on An Abortion Dialogue · 2011-02-12T04:10:17.019Z · LW · GW

I agree with your main point (that this is a stumbling block for some people), but there are others who will contend that A and part of B (namely the irreversible error) do apply to unwanted babies (usually, or on average), and that the reason why abortion is more evil than contraception is because it's an error of commission rather than omission.

Comment by janos on Procedural Knowledge Gaps · 2011-02-09T04:57:01.279Z · LW · GW

But I drink orange juice with pulp; then the fiber is no longer absent, though I guess it's reduced. The vitamins and minerals are still present, though, aren't they?

Comment by janos on Procedural Knowledge Gaps · 2011-02-08T03:30:12.614Z · LW · GW

Regarding the fruit juices, I agree that fruit-flavored mixtures of HFCS and other things generally aren't worth much, but aren't proper fruit juices usually nutritious? (I mean the kinds where the ingredients consist of fruit juices, perhaps water, and nothing else.)

Comment by janos on Procedural Knowledge Gaps · 2011-02-07T04:23:02.225Z · LW · GW

Regarding investment, my suggestion (if you work in the US) is to open a basic (because it doesn't periodically charge you fees) E*TRADE account here. They will provide an interface for buying and selling shares of stocks and various other things (ETFs and such; I mention stocks and ETFs because those are the only things I've tried doing anything with). They will charge you $10 for every transaction you make, so unless you're going to be (or become) active/clever enough to make it worthwhile, it makes sense not to trade too frequently.

EDIT: These guys appear to charge less, though they also deal in fewer things (e.g. no bonds).

Comment by janos on A bit meta: Do posts come in batches? If so, why? · 2011-02-06T18:37:22.640Z · LW · GW

Echoing the others:

If we suppose these are 22 iid samples from a Poisson then the max likelihood estimate for the Poisson parameter is 0.82 (the sample mean). Simulating such draws from such a Poisson and looking at sample correlation between Jan 15-Feb 4 and Jan 16-Feb 5, the p-value is 0.1. And when testing Poisson-ness vs negative binomial clustering (with the same mean), the locally most powerful test uses statistic (x-1.32)^2, and gives a simulated p-value of 0.44.

Comment by janos on My hour-long interview with Yudkowsky on "Becoming a Rationalist" · 2011-02-06T03:54:36.571Z · LW · GW

It's provided in the linked page; you need to scroll down to see it.

Comment by janos on Probability Space & Aumann Agreement · 2009-12-12T06:14:17.775Z · LW · GW

What I don't like about the example you provide is: what player 1 and player 2 know needs to be common knowledge. For instance if player 1 doesn't know whether player 2 knows whether die 1 is in 1-3, then it may not be common knowledge at all that the sum is in 2-6, even if player 1 and player 2 are given the info you said they're given.

This is what I was confused about in the grandparent comment: do we really need I and J to be common knowledge? It seems so to me. But that seems to be another assumption limiting the applicability of the result.

Comment by janos on Probability Space & Aumann Agreement · 2009-12-11T16:42:10.919Z · LW · GW

As far as I understand, agent 1 doesn't know that agent 2 knows A2, and agent 2 doesn't know that agent 1 knows A1. Instead, agent 1 knows that agent 2's state of knowledge is in J and agent 2 knows that agent 1's state of knowledge is in I. I'm a bit confused now about how this matches up with the meaning of Aumann's Theorem. Why are I and J common knowledge, and {P(A|I)=q} and {P(A|J)=q} common knowledge, but I(w) and J(w) are not common knowledge? Perhaps that's what the theorem requires, but currently I'm finding it hard to see how I and J being common knowledge is reasonable.

Edit: I'm silly. I and J don't need to be common knowledge at all. It's not agent 1 and agent 2 who perform the reasoning about I meet J, it's us. We know that the true common knowledge is a set from I meet J, and that therefore if it's common knowledge that agent 1's posterior for the event A is q1 and agent 2's posterior for A is q2, then q1=q2. And it's not unreasonable for these posteriors to become common knowledge without I(w) and J(w) becoming common knowledge. The theorem says that if you're both perfect Bayesians and you have the same priors then you don't have to communicate your evidence.

But if I and J are not common knowledge then I'm confused about why any event that is common knowledge must be built from the meet of I and J.

Comment by janos on Probability Space & Aumann Agreement · 2009-12-11T16:10:54.624Z · LW · GW

That simplification is a situation in which there is no common knowledge. In world-state w, agent 1 knows A1 (meaning knows that the correct world is in A1), and agent 2 knows A2. They both know A1 union A2, but that's still not common knowledge, because agent 1 doesn't know that agent 2 knows A1 union A2.

I(w) is what agent 1 knows, if w is correct. If all you know is S, then the only thing you know agent 1 knows is I(S), and the only thing that you know agent 1 knows agent 2 knows is J(I(S)), and so forth. This is why the usual "everyone knows that everyone knows that ... " definition of common knowledge translates to I(J(I(J(I(J(...(w)...).

Comment by janos on Probability Space & Aumann Agreement · 2009-12-11T15:48:52.173Z · LW · GW

Huh? The reference set Ω is the set of possible world histories, out of which one element is the actual world history. I don't see what's wrong with this.

Comment by janos on Probability Space & Aumann Agreement · 2009-12-11T15:33:42.965Z · LW · GW

Nope; it's the limit of I(J(I(J(I(J(I(J(...(w)...), where I(S) for a set S is the union of the elements of I that have nonempty intersections with S, i.e. the union of I(x) over all x in S, and J(S) is defined the same way.

Alternately if instead of I and J you think about the sigma-algebras they generate (let's call them sigma(I) and sigma(J)), then sigma(I meet J) is the intersection of sigma(I) and sigma(J). I prefer this somewhat because the machinery for conditional expectation is usually defined in terms of sigma-algebras, not partitions.

Comment by janos on Bayesian Flame · 2009-08-04T14:31:12.310Z · LW · GW

Right, that is a good piece. But I'm afraid I was unclear. (Sorry if I was.) I'm looking for a prior over stationary sequences of digits, not just sequences. I guess the adjective "stationary" can be interpreted in two compatible ways: either I'm talking about sequences such that for every possible string w the proportion of substrings of length |w| that are equal to |w|, among all substrings of length |w|, tends to a limit as you consider more and more substrings (either extending forward or backward in the sequence); this would not quite be a prior over generators, and isn't what I meant.

The cleaner thing I could have meant (and did) is the collection of stationary sequence-valued random variables, each of which (up to isomorphism) is completely described by the probabilities p_w of a string of length |w| coming up as w. These, then, are generators.

Comment by janos on Bayesian Flame · 2009-07-29T06:04:34.689Z · LW · GW

Each element of the set is characterized by a bunch of probabilities; for example there is p_01101, which is the probability that elements x_{i+1} through x_{i+5} are 01101, for any i. I was thinking of using the topology induced by these maps (i.e. generated by preimages of open sets under them).

How is putting a noninformative prior on the reals hard? With the usual required invariance, the uniform (improper) prior does the job. I don't mind having the prior be improper here either, and as I said I don't know what invariance I should want; I can't think of many interesting group actions that apply. Though of course 0 and 1 should be treated symmetrically; but that's trivial to arrange.

I guess you're right that regularities can be described more generally with computational models; but I expect them to be harder to deal with than this (relatively) simple, noncomputational (though stochastic) model. I'm not looking for regularities among the models, so I'm not sure how a computational model would help me.

Comment by janos on Bayesian Flame · 2009-07-28T15:42:44.452Z · LW · GW

The purpose would be to predict regularities in a "language", e.g. to try to achieve decent data compression in a way similar to other Markov-chain-based approaches. In terms of properties, I can't think of any nontrivial ones, except the usual important one that the prior assign nonzero probability to every open set; mainly I'm just trying to find something that I can imagine computing with.

It's true that there exists a bijection between this space and the real numbers, but it doesn't seem like a very natural one, though it does work (it's measurable, etc). I'll have to think about that one.

Comment by janos on Bayesian Flame · 2009-07-27T16:56:36.640Z · LW · GW

Since we're discussing (among other things) noninformative priors, I'd like to ask: does anyone know of a decent (noninformative) prior for the space of stationary, bidirectionally infinite sequences of 0s and 1s?

Of course in any practical inference problem it would be pointless to consider the infinite joint distribution, and you'd only need to consider what happens for a finite chunk of bits, i.e. a higher-order Markov process, described by a bunch of parameters (probabilities) which would need to satisfy some linear inequalities. So it's easy to find a prior for the space of mth-order Markov processes on {0,1}; but these obvious (uniform) priors aren't coherent with each other.

I suppose it's possible to normalize these priors so that they're coherent, but that seems to result in much ugliness. I just wonder if there's a more elegant solution.

Comment by janos on Bayesian Flame · 2009-07-27T15:55:30.208Z · LW · GW

Updated, eh? Where did your prior come from? :)

Comment by janos on Bayesian Flame · 2009-07-27T15:48:23.172Z · LW · GW

I am trying to understand the examples on that page, but they seem strange; shouldn't there be a model with parameters, and a prior distribution for those parameters? I don't understand the inferences. Can someone explain?

Comment by janos on Religion, Mystery, and Warm, Soft Fuzzies · 2009-05-15T16:24:50.874Z · LW · GW

I think you're confusing the act of receiving information/understanding about an experience with the experience itself.

Re: the joke example, I think that one would get tired of hearing a joke too many times, and that's what the dissection is equivalent to, because you keep hearing it in your head; but if you already get the joke, the dissection is not really adding to your understanding. If you didn't get the joke, you will probably receive a twinge of enjoyment at the moment when you finally do understand. If you don't understand a joke, I don't think you can get warm fuzzies from it.

With hormones, again I think that being explicitly reminded of the role of hormones in physical attraction while experiencing physical attraction reduces warm fuzzies only because it's distracting you from the source of the warm fuzzies and making you feel self-conscious. On the other hand, knowing more about the role of hormones should not generally distract you from your physical attraction; instead you could use it to tada get more warm fuzzies.

Comment by janos on Generalizing From One Example · 2009-05-01T15:32:42.465Z · LW · GW

Interesting. My internal experience of programming is quite different; I don't see boxes and lines. Data structures for me are more like people who answer questions, although of course with no personality or voice; the voice is mine as I ask them a question, and they respond in a "written" form, i.e. with a silent indication. So the diagrams people like to draw for databases and such don't make direct sense to me per se; they're just a way of organizing written information.

I am finding it quite difficult to coherently and correctly describe such things; no part of this do I have any certainty of, except that I know I don't imagine black-and-white box diagrams.

Comment by janos on The Trouble With "Good" · 2009-04-17T14:22:00.721Z · LW · GW

Do you have some good examples of abuse of Bayes' theorem?