Kelly and the Beast 2017-06-28T10:02:59.472Z · score: 0 (0 votes)


Comment by sen on Epistemic Laws of Motion · 2017-07-08T20:49:24.383Z · score: 0 (0 votes) · LW · GW

I don't see how your comment contradicts the part you quoted. More pressure doesn't lead to more change (in strategy) if resistance increases as well. That's consistent with what /u/SquirrelInHell stated.

Comment by sen on Epistemic Laws of Motion · 2017-07-08T08:29:25.055Z · score: 0 (0 votes) · LW · GW

That mass corresponds to "resistance to change" seems fairly natural, as does the correspondence between "pressure to change" and impulse. The strange part seems to be the correspondence between "strategy" and velocity.'' Distance would be something like strategy * time.

Does a symmetry in time correspond to a conservation of energy? Is energy supposed to correspond to resistance? Maybe, though that's a little hard to interpret, so it's a little difficult to apply Lagrangian or Hamiltonian mechanics. The interpretation of energy is important. Without that, the interpretation of time is incomplete and possibly incoherent.

Is there an inverse correspondence between optimal certainty in resistance strategy (momentum) and optimal certainty in strategy time (distance)? I guess, so findings from quantum uncertainty principles and information geometry may apply.

Does strategy impact one's perception of "distances" (strategy * time) and timescales? Maybe, so maybe findings from special relativity would apply. A universally-observable distance isn't defined though, and that precludes a more coherent application of special/general relativity. Some universal observables should be stated. Other than the obvious objectivity benefits, this could help more clearly define relationships between variables of different dimensions. This one isn't that important, but it would enable much more interesting uses of the theory.

Comment by sen on The Unreasonable Effectiveness of Certain Questions · 2017-07-05T04:55:03.617Z · score: 0 (0 votes) · LW · GW

The process you went through is known in other contexts as decategorification. You attempted to reduce the level of abstraction, noticed a potential problem in doing so, and concluded that the more abstract notion was not as well-conceived as you imagined.

If you try to enumerate questions related to a topic (Evil), you will quickly find that you (1) repeatedly tread the same ground, (2) are often are unable to combine findings from multiple questions in useful ways, and (3) are often unable to identify questions worth answering, let alone a hierarchy that suggests which questions might be more worth answering than others.

What you are trying to identify are the properties and structure of evil. A property of Evil is a thing that must be preserved in order for Evil to be Evil. The structure of Evil is the relationship between Evil and other (Evil or non-Evil) entities.

You should start by trying to identify the shape of Evil by identifying its border, where things transition from Evil to non-Evil and vice versa. This will give you an indication of which properties are important. From there, you can start looking at how Evil relates to other things, especially in regards to its properties. This will give you some indication of its structure. Properties are important for identifying Evil clearly. Structure is important for identifying things that are equivalent to Evil in all ways that matter. It is often the case that the two are not the same.

If you want to understand this better, I recommend looking into category theory. The general process of identifying ambiguities, characterizing problems in the right way, applying prior knowledge, and gluing together findings into a coherent whole is fairly well-worn. You don't have to start from scratch.

Comment by sen on We need a better theory of happiness and suffering · 2017-07-05T00:08:35.766Z · score: 1 (1 votes) · LW · GW

"But hold up", you say. "Maybe that's true for special cases involving competing subagents, ..."

I don't see how the existence of subagents complicates things in any substantial way. If the existence of competing subagents is a hindrance to optimality, then one should aim to align or eliminate subagents. (Isn't this one of the functions of meditation?) Obviously this isn't always easy, but the goal is at least clear in this case.

It is nonsensical to treat animal welfare as a special case of happiness and suffering. This is because animal happiness and suffering can be only be understood through analogical reasoning, not through logical reasoning. A logical framework of welfare can only be derived through subjects capable of conveying results since results are subjective. The vast majority of animals, at least so far, cannot convey results, so we need to infer results on animals based on similarities between animal observables and human observables. Such inference is analogical and necessarily based entirely on human welfare.

If you want a theory of happiness and suffering in the intellectual sense (where physical pleasure and suffering are ignored), I suspect what you want is a theory of the ideals towards which people strive. For such an endeavor, I recommend looking into category theory, in which ideals are easily recognizable, and whose ideals seem to very closely (if not perfectly) align with intuitive notions.

Comment by sen on Idea for LessWrong: Video Tutoring · 2017-07-02T13:05:51.376Z · score: 0 (0 votes) · LW · GW

I meant it as "This seems like a clear starting point." You're correct that I think it's easy to not get lost with those two starting points.

I'm my experience with other fields, it's easy to get frustrated and give up. Getting lost is quite a bit more rare. You'll have to click through a hundred dense links to understand your first paper in machine learning, as with any other field. If you can trudge through that, you'll be fine. If you can't, you'll at least know what to ask.

Also, are you not curious about how much initiative people have regarding the topics they want to learn?

Comment by sen on Idea for LessWrong: Video Tutoring · 2017-07-02T08:38:56.399Z · score: 0 (0 votes) · LW · GW

A question for people asking for machine learning tutors: have you tried just reading through OpenAI blog posts and running the code examples they embed or link? Or going through the TensorFlow tutorials?

Comment by sen on Open thread, June 26 - July 2, 2017 · 2017-07-02T01:12:58.089Z · score: 0 (0 votes) · LW · GW

Yes. I follow authors, I ask avid readers similar to me for recommendations, I observe best-of-category polls, I scan through collections of categorized stories for topics that interest me, I click through "Also Liked" and "Similar" links for stories I like. My backlog of things to read is effectively infinite.

Comment by sen on What useless things did you understand recently? · 2017-07-02T00:28:44.853Z · score: 1 (1 votes) · LW · GW

I see. Thanks for the explanation.

Comment by sen on What useless things did you understand recently? · 2017-07-01T20:55:23.489Z · score: 0 (0 votes) · LW · GW

How so? I thought removing the border on each negation was the right way.

I gave an example of where removing the border gives the wrong result. Are you asking why "A is a subset of Not Not A" is true in a Heyting algebra? I think the proof goes like this:

  • (1) (a and not(a)) = 0
  • (2) By #1, (a and not(a)) is a subset of 0
  • (3) For all c,x,b, ((c and x) is a subset of b) = (c is a subset of (x implies b))
  • (4) By #2 and #4, a is a subset of (not(a) implies 0)
  • (5) For all c, not(c) = (c implies 0)
  • (6) By #4 and #5, a is a subset of not(not(a))

Maybe your method is workable when you interpret a Heyting subset to be a topological superset? Then 1 is the initial (empty) set and 0 is the terminal set. That doesn't work with intersections though. "A and Not A" must yield 0, but the intersection of two non-terminal sets cannot possibly yield a terminal set. The union can though, so I guess that means you'd have to represent And with a union. That still doesn't work though because "Not A and Not Not A" must yield 0 in a Heyting algebra, but it's missing the border of A in the topological method, so it again isn't terminal.

I don't see how the topological method is workable for this.

Comment by sen on What useless things did you understand recently? · 2017-07-01T07:48:15.518Z · score: 0 (0 votes) · LW · GW

I guess today I'm learning about Heyting algebras too.

I don't think that circle method works. "Not Not A" isn't necessarily the same thing as "A" in a Heyting algebra, though your method suggests that they are the same. You can try to fix this by adding or removing the circle borders through negation operations, but even that yields inconsistent results. For example, if you add the border on each negation, "A or Not A" yields 1 under your method, though it should not in a Heyting algebra. If you remove the border on each negation "A is a subset of Not Not A" is false under your method, though it should yield true.

I think it's easier to think of Heyting algebra in terms of functions and arguments. "A implies B" is a function that takes an argument of type A and produces an argument of type B. 0 is null. "A and B" is the set of arguments a,b where a is of type A and b is of type B. If null is in the argument list, then the whole argument list becomes null. "Not A" is a function that takes an argument of type A and produces 0. "Not Not A" can be thought of in two ways: (1) it takes an argument of type Not A and produces 0, or (2) it takes an argument of type [a function that takes an argument of type A and produces 0] and produces 0.

If "(A and B and C and ...) -> 0" then "A -> (B -> (C -> ... -> 0))". If you've worked with programming languages where lambda functions are common, it's like taking a function of 2 arguments and turning it into a function of 1 argument by fixing one of the arguments.

I don't see it on the Wikipedia page, but I'd guess that "A or B" means "(Not B implies A) and (Not A implies B)".

If you don't already, I highly recommend studying category theory. Most abstract mathematical concepts have simple definitions in category theory. The category theoretic definition of Heyting algebras on Wikipedia consists of 6 lines, and it's enough to understand all of the above except the Or relation.

Comment by sen on Kelly and the Beast · 2017-06-30T08:12:17.537Z · score: 0 (0 votes) · LW · GW

I am, and thanks for answering. Keep in mind that there are ways to make your intuition more reliable, if that's a thing you want.

Comment by sen on Kelly and the Beast · 2017-06-30T07:40:31.802Z · score: 0 (0 votes) · LW · GW

Fair enough. I have a question then. Do you personally agree with Bob?

Comment by sen on Kelly and the Beast · 2017-06-30T07:38:31.475Z · score: 0 (0 votes) · LW · GW

Algebraic reasoning is independent of the number system used. If you are reasoning about utility functions in the abstract and if your reasoning does not make use of any properties of numbers, then it doesn't matter what numbers you use. You're not using any properties of finite numbers to define anything, so the fact of whether or not these numbers are finite is irrelevant.

Comment by sen on Kelly and the Beast · 2017-06-30T07:25:53.177Z · score: 0 (0 votes) · LW · GW

The original post doesn't require arbitrarily fine distinctions, just 2^trillion distinctions. That's perfectly finite.

Your comment about Bob not assigning a high utility value to anything is equivalent to a comment stating that Bob's utility function is bounded.

Comment by sen on Kelly and the Beast · 2017-06-30T07:01:50.635Z · score: 0 (0 votes) · LW · GW

It can make sense to say that a utility function is bounded, but that implies certain other restrictions. For example, bounded utility functions cannot be decomposed into independent (additive or multiplicative, these are the only two options) subcomponents if the number of subcomponents is unknown. Any utility function that is summed or multiplied over an unknown number of independent (e.g.) societies must be unbounded*. Does that mean you believe that utility functions can't be aggregated over independent societies or that no two societies can contribute independently to the utility function? That latter implies that a utility function cannot be determined without knowing about all societies, which would make the concept useless. Do you believe that utility functions can be aggregated at all beyond the individual level?

  • Keep in mind that "unbounded" here means "arbitrarily additive". In the multiplicative case, even if a utility function is always less than 1, if an individual's utility can be made arbitrarily close to 0, then it's still unbounded. Such an individual still has enough to gain by betting on a trillion coin tosses.

You mentioned that a utility function should be seen as a proxy to decision making. If decisions can be independent, then their contributions to the definition of a utility function must be independent*. If the utility function is bounded, then the number of independent decisions something can decide between must also be bounded. Maybe that makes sense for individuals since you distinguished a utility function as a summary of "current" decision-making, and any individual is presumably limited in their ability to decide between independent outcomes at any given point in time. Again, though, this causes problems for aggregate utility functions.

  • Consider the functor F that takes any set of decisions (with inclusion maps between them) to the least-assuming utility function consistent with them. There exists a functor G that takes any utility function to the maximal set of decisions derivable from it. F,G together form a contravariant adjunction between set of decisions and utility functions. F is then left-adjoint to G. Therefore F preserves finite coproducts as finite products. Therefore for any disjoint union of decisions A,B, the least-assuming utility function defined over them exists and is F(A+B)=F(A)*F(B). The proof is nearly identical for covariant adjunctions.

It seems like nonsense to say that utility functions can't be aggregated. A model of arbitrary decision making shouldn't suddenly become impossible just because you're trying to model, say, three individuals rather than one. The aggregate has preferential decision making just like the individual.

Comment by sen on Kelly and the Beast · 2017-06-29T17:34:57.855Z · score: 0 (0 votes) · LW · GW

Also it's unclear to me what the connection is between this part and the second.

My bad, I did a poor job explaining that. The first part is about the problems of using generic words (evolution) with fuzzy decompositions (mates, predators, etc) to come to conclusions, which can often be incorrect. The second part is about decomposing those generic words into their implied structure, and matching that structure to problems in order to get a more reliable fit.

I don't believe that "I don't know" is a good answer, even if it's often the correct one. People have vague intuitions regarding phenomena, and wouldn't it be nice if they could apply those intuitions reliably? That requires a mapping from the intuition (evolution is responsible) to the problem, and the mapping can only be made reliable once the intuition has been properly decomposed into its implied structure, and even then, only if the mapping is based on the decomposition.

I started off by trying to explain all of that, but realized that there is far too much when starting from scratch. Maybe someday I'll be able to write that post...

Comment by sen on Kelly and the Beast · 2017-06-29T17:25:32.380Z · score: 0 (0 votes) · LW · GW

The cell example is an example of evolution being used to justify contradictory phenomena. The exact same justification is used for two opposing conclusions. If you thought there was nothing wrong with those two examples being used as they were, then there is something wrong with your model. They literally use the exact same justification to come to opposing conclusions.

The second set of explanations have fewer, more reliably-determinable dependencies, and their reasoning is more generally applicable.

That is correct, they have zero prediction and compression power. I would argue that the same can be said of many cases where people misuse evolution as an explanation.

When people falsely pretend to have knowledge of some underlying structure or correlate, they are (1) lying and (2) increasing noise, which by various definition is negative information. When people use evolution as an explanation in cases where it does not align with the implications of evolution, they are doing so under a false pretense. My suggested approach (1) is honest and (2) conveys information about the lack of known underlying structure or correlate.

I don't know what you mean by "sensible definition". I have a model for that phrase, and yours doesn't seem to align with mine.

Comment by sen on Kelly and the Beast · 2017-06-29T05:17:51.586Z · score: 1 (1 votes) · LW · GW

Would your answer change if I let you flip the coin until you lost? Based on your reasoning, it should not. Despite it being an effectively-guaranteed extinction, the infinitesimal chance is overwhelmed by the gains in the case of infinitely many good coin flips.

I would not call the Kelly strategy risk-averse. I imagine that word to mean "grounded in a fantasy where risk is exaggerated". I would call the second strategy risk-prone. The difference is that the Kelly strategy ends up being the better choice in realistic cases, whereas the second strategy ends up being the better choice in the extraordinarily rare wishful cases. In that sense, I see this question as one that differentiates people that prefer to make decisions grounded in reality from those that prefer to make decisions grounded in wishful thinking. The utilitarian approach then is prone to wishful thinking.

Still, I get your point. There may exist a low-chance scenario for which I would, with near certainty, trade the Kelly-heaven world for a second-hell world. To me, that means there exists a scenario that could lull me into gambling on wildly-improbable wishful thinking. Though such scenarios may exist, and though I may bet on such scenarios when presented with them, I don't believe it's reasonable to bet on them. I can't tell if you literally believe that it's reasonable to bet on such scenarios or if you're imagining something wholly different from me.

Comment by sen on Kelly and the Beast · 2017-06-28T16:37:29.974Z · score: 0 (0 votes) · LW · GW

Dagon: You can artificially bound utility to some arbitrarily low "bankruptcy" point. The lack of a natural one isn't relevant to the question of whether a utility function makes sense here. On treating utility as a resource, if you can make decisions to increase or decrease utility, then you can play the game. Your basic assumption seems to be that people can't meaningfully make decisions that change utility, at which point there is no point in measuring it, as there's nothing anyone can do about it.

The point of unintuitive high utilities and upper-bounded utilities I believe deserves another post.

Comment by sen on Which areas of rationality are underexplored? - Discussion Thread · 2016-12-10T13:02:16.688Z · score: 2 (2 votes) · LW · GW

Regarding the Buckingham Pi Theorem (BPT), I think I can double my recommendation that you try to understand the Method of Lagrange Multipliers (MLM) visually. I'll try to explain in the following paragraph knowing that it won't make much sense on first reading.

For the Method of Lagrange Multipliers, suppose you have some number of equations in n variables. Consider the n-dimensional space containing the set of all solutions to those equations. The set of solutions describes a k-dimensional manifold (meaning the surface of the manifold forms a k-dimensional space), where k depends on the number of independent equations you have. The set of all points perpendicular to this manifold (the null space, or the space of points that, projected onto the manifold, give the zero vector) can be described by an (n-k)-dimensional space. Any (n-k)-dimensional space can be generated (by vector scaling and vector addition) of (n-k) independent vectors. For the Buckingham Pi Theorem, replace each vector with a matrix/group, vector scaling with exponentiation, and vector addition with multiplication. Your Buckingham Pi exponents are Lagrange multipliers, and your Pi groups are Lagrange perpendicular vectors (the gradient/normal vectors of your constraints/dimensions).

I guess in that sense, I can see why people would make the jump to Lie groups. The Pi Groups / basis vectors form the generator of any other vector in that dimensionless space, and they're obviously invertible. Honestly, I haven't spent much time with Lie Groups and Lie Algebra, so I can't tell you why they're useful. If my earlier explanation of dimensionless quantities holds (which, after seeing the Buckingham Pi Theorem, I'm even more convinced that it does), then it has something to do with symmetry with respect to scale, The reason I say "scale" as opposed to any other x * x → x quantity is that the scale kind of dimensionlessness seems to pop up in a lot of dimensionless quantities specific to fluid dynamics, including Reynold's Number.

Sorry, I know that didn't make much sense. I'm pretty sure it will though once you go through the recommendations in my earlier reply.

Regarding Reynold's Number, I suspect you're not going to see the difference between the dimensional and the dimensionless quantities until you try solving that differential equation at the bottom of the page. Try it both with and without converting to dimensionless quantities, and make sure to keep track of the semantics of each term as you go through the process. Here's one that's worked out for the dimensionless case. If you try solving it for the non-dimensionless case, you should see the problem.

It's getting really late. I'll go through your comments on similarity variables in a later reply.

Thanks for the references and your comments. I've learned a lot from this discussion.

Comment by sen on Which areas of rationality are underexplored? - Discussion Thread · 2016-12-09T10:44:07.842Z · score: 1 (1 votes) · LW · GW

See my response below to WhySpace on getting started with group theory through category theory. For any space-oriented field, I also recommend looking at the topological definition of a space. Also, for any calculus-heavy field, I recommend meditating on the Method of Lagrange Multipliers if you don't already have a visual grasp of it.

I don't know of any resource that tackles the problem of developing models via group theory. Developing models is a problem of stating and applying analogies, which is a problem in category theory. If you want to understand that better, you can look through the various classifications of functors since the notion of a functor translates pretty accurately to "analogy".

I have no background in fluid dynamics, so please filter everything I say here through your own understanding, and please correct me if I'm wrong somewhere.

I don't think there's any inherent relationship between dimensionless parameters and group theory. The reason being that dimensionless quantities can refer to too many things (i.e., they're not really dimensionless, and different dimensionlessnesses have different properties... or rather they may be dimensionless, but they're not typeless). Consider that the !∘sqrt∘ln of a dimensionless quantity is also technically a dimensionless quantity while also being almost-certainly useless and uninterpretable. I suppose if you can rewrite an equation in terms of dimensionless quantities whose relationships are restricted to have certain properties, then you can treat them like other well-known objects, and you can throw way more math at them.

For example, suppose your "dimensionless" quantity is a scaling parameter such that scale * scale → scale (the product of two scaling operations is equivalent to a single scaling operation). By converting your values to scales, you've gained a new operation to work with due to not having to re-translate your quantities on each successive multiplication: element-wise exponentiation. I'd personally see that as a gateway to applying generating series (because who doesn't love generating series?), but I guess a more mechanics-y application of that would be solving differential equations, which often require exponentiating things.

Any time you have a set of X quantities that can be applied to one another to get another of the X quantities, you have a group of some sort (with some exceptions). That's what's going on with the scaling example (x * x → x), and that's what's not going on with the !∘sqrt∘ln example. The scaling example just happens to be a particularly simple example of a group. You get less trivial examples when you have multiple "dimensionless" quantities that can interact with one another in standard ways. For example, if vector addition, scaling, and dot products are sensible, your vectors can form a Hilbert space, and you can use wonderful things like angles and vector calculus to meaningful effect.

I can probably give a better answer if I know more precisely what you're referring to. Do you have examples of fluid dynamicists simplifying equations and citing group theory as the justification?

Comment by sen on Beware of identifying with school of thoughts · 2016-12-06T09:23:08.202Z · score: 0 (0 votes) · LW · GW

Or is it that a true sophisticate would consider where and where not to apply sophistry?

Comment by sen on Making intentions concrete - Trigger-Action Planning · 2016-12-06T09:11:27.090Z · score: 0 (0 votes) · LW · GW

Information on the discussion board is front-facing for some time, then basically dies. Yes, you can use the search to find it again, but that becomes less reliable as discussion of TAPs increases. It's also antithetical to the whole idea behind TAP.

The wiki is better suited for acting as a repository of information.

Comment by sen on Beware of identifying with school of thoughts · 2016-12-06T07:12:37.138Z · score: 0 (0 votes) · LW · GW

I don't understand what point you're making with the computer, as we seem to be in complete agreement there. Nothing about the notion of ideals and definitions suggests that computers can't have them or their equivalent. It's obvious enough that computers can represent them, as you demonstrated with your example of natural numbers. It's obvious enough that neurons and synapses can encode these things, and that they can fire in patterned ways based on them because... well that's what neurons do, and neurons seem to be doing to bulk of the heavy lifting as far as thinking goes.

Where we disagree is in saying that all concepts that our neurons recognize are equivalent and that they should be reasoned about in the same way. There are clearly some notions that we recognize as being valid only after seeing sufficient evidence. For these notions, I think bayesian reasoning is perfectly well-suited. There are also clearly notions we recognize as being valid for which no evidence is required. For these, I think we need something else. For these notions, only usefulness is required, and sometimes not even that. Bayesian reasoning cannot deal with this second kind because their acceptability has nothing to do with evidence.

You argue that this second kind is irrelevant because these things exist solely in people's minds. The problem is that the same concepts recur again and again in many people minds. I think I would agree with you if we only ever had to deal with a physical world in which people's minds did not matter all that much, but that's not the world we live in. If you want to be able to reliably convey your ideas to others, if you want to understand how people think at a more fundamental level, if you want your models to be useful to someone other than yourself, if you want to develop ideas that people will recognize as valid, if you want to generalize ideas that other people have, if you want your thoughts to be integrated with those of a community for mutual benefit, then you cannot ignore these abstract patterns because these abstract patterns constitute such a vast amount of how people think.

It also, incidentally, has a tremendous impact on how your own brain thinks and the kinds of patterns your brain lets you consciously recognize. If you want to do better generalizing your own ideas in reliable and useful ways, then you need to understand how they work.

For what it's worth, I do think there are physically-grounded reasons for why this is so.

Comment by sen on Which areas of rationality are underexplored? - Discussion Thread · 2016-12-06T06:19:54.539Z · score: 1 (1 votes) · LW · GW

"Group" is a generalization of "symmetry" in the common sense.

I can explain group theory pretty simply, but I'm going to suggest something else. Start with category theory. It is doable, and it will give you the magical ability of understanding many math pages on Wikipedia, or at least the hope of being able to understand them. I cannot overstate how large an advantage this gives you when trying to understand mathematical concepts. Also, I don't believe starting with group theory will give you any advantage when trying to understand category theory, and you're going to want to understand category theory if you're interested in reasoning.

When I was getting started with category theory, I went back and forth between several pages (Category Theory, Functor, Universal Property, Universal Object, Limits, Adjoint Functors, Monomorphism, Epimorphism). Here are some of the insights that made things click for me:

  • An "object" in category theory corresponds to a set in set theory. If you're a programmer, it's easier to think of a single categorical object as a collection (class) of OOP objects. It's also valid and occasionally useful to think of a single categorical object as a single OOP object (e.g., a collection of fields).
  • A "morphism" in category theory corresponds to a function in set theory. If you think of a categorical object as a collection of OOP objects, then a morphism takes as input a single OOP object at a time.
  • It's perfectly valid for a diagram to contain the same categorical object twice. Diagrams only show relations, and it's perfectly valid for an OOP object to be related to another OOP object of the same class. When looking at commutative diagrams that seem to contain the same categorical object twice, think of them as distinct categorical objects.
  • Diagrams don't only show relationships between OOP objects. They can also show relationships between categorical objects. For example, a diagram might state that there is a bijection between two categorical objects.
  • You're not always going to have a natural transformation between two functors of the same category.
  • When trying to understand universal properties, the following mapping is useful (look at the diagrams on Wikipedia): A is the Platonic Form of Y, U is a fire that projects only some subset of the aspects of being like A.
  • The duality between categorical objects and OOP objects is critical to understanding the difference between any diagram and its dual (reversed-morphisms). Recognizing this makes it much easier to understand limits and colimits.

Once you understand these things, you'll have the basic language down to understand group theory without much difficulty.

Comment by sen on Beware of identifying with school of thoughts · 2016-12-06T05:31:20.912Z · score: 1 (1 votes) · LW · GW

The distinction between "ideal" and "definition" is fuzzy the way I'm using it, so you can think of them as the same thing for simplicity.

Symmetry is an example of an ideal. It's not a thing you directly observe. You can observe a symmetry, but there are infinitely many kinds of symmetries, and you have some general notion of symmetry that unifies all of them, including ones you've never seen. You can construct a symmetry that you've never seen, and you can do it algorithmically based on your idea of what symmetries are given a bit of time to think about the problem. You can even construct symmetries that, at first glance, would not look like a symmetry to someone else, and you can convince that someone else that what you've constructed is a symmetry.

The set of natural numbers is an example of something that's defined, not observed. Each natural number is defined sequentially, starting from 1.

Addition is an example of something that's defined, not observed. The general notion of a bottle is an ideal.

In terms of philosophy, an ideal is the Platonic Form of a thing. In terms of category theory, an ideal is an initial or terminal object. In terms of category theory, a definition is a commutative diagram.

I didn't say these things weren't influenced by past observations and correlations. I said past observations and correlations were irrelevant for distinguishing them. Meaning, for example, you can distinguish between more natural numbers than your past experiences should allow.

Comment by sen on Making intentions concrete - Trigger-Action Planning · 2016-12-05T06:56:57.645Z · score: 0 (0 votes) · LW · GW

Fair enough, though I disagree with the idea of using the discussion board as a repository of information.

Comment by sen on Beware of identifying with school of thoughts · 2016-12-05T06:09:18.239Z · score: 0 (0 votes) · LW · GW

Is there ever a case where priors are irrelevant to a distinction or justification? That's the difference between pure Bayesian reasoning and alternatives.

OP gave the example of the function of organs for a different purpose, but it works well here. To a pure Bayesian reasoner, there is no difference between saying that the heart has a function and saying that the heart is correlated with certain behaviors, because priors alone are not sufficient to distinguish the two. Priors alone are not sufficient to distinguish the two because the distinction has to do with ideals and definitions, not with correlations and experience.

If a person has issues with erratic blood flow leading to some hospital visit, why should we look at the heart for problems? Suppose there were a problem found with the heart. Why should we address the problem at that level as opposed to fixing the blood flow issue in some more direct way? What if there was no reason for believing that the heart problem would lead to anything but the blood flow problem? What's the basis for addressing the underlying cause as opposed to addressing solely the issue that more directly led to a hospital visit?

There is no basis unless you recognize that addressing underlying causes tends to resolve issues more cleanly, more reliably, more thoroughly, and more persistently than addressing symptoms, and that the underlying cause only be identified by distinguishing erroneous functioning from other abnormalities. Pure Bayesian reasoners can't make the distinction because the distinction has to do with ideals and definitions, not with correlations and experience.

It's really hard for me to see under what model of the world (correct) Bayesian analysis could be misleading.

If you wanted a model that was never misleading, you might as well use first order logic to explain everything. Or go straight for the vacuous case and don't try to explain anything. That problem is that that doesn't generalize well, and it's too restrictive. It's about broadening your notion of reasoning so that you consider alternative justifications and more applications.

Comment by sen on Making intentions concrete - Trigger-Action Planning · 2016-12-04T14:21:08.336Z · score: 1 (1 votes) · LW · GW

That's not what OP says, and it's also a non-sequitur. Obviousness and intuitiveness does not imply that all goals should be turned to TAPs or that vague triggers shouldn't be used. It's obvious and intuitive to anyone that's flown on an airplane that the Earth is spheroid, but that doesn't mean I should use geodesics to compute the best way to get the the grocery store.

TAPs are useful for people that have problems following through with intentions. OP mentions three example indicators of such problems. If you don't have problems following through, then there is no reason to "make all of your goals into TAPs" or "never think anything with a vague trigger". Putting effort into solving a non-problem is a waste.

To answer your question, hell no. It's clear why this would help certain people, but it's certainly not optimal for people that can look ahead a bit into a future, keep things in mind for later, or... you know, stick to things. The general idea behind TAPs is that people are lazy when they have planning left to do, and they can't remember to do things. Yes, I'll set up notes sometimes, but for the vast majority of things, my brain just reminds me without any explicit triggers. It's not like I feel lazy either, and I don't feel off-put by non-concrete goals, not even a little.

I'm satisfied with my level of productivity, I don't get discouraged by planning or non-concrete goals, and I don't have trouble remembering to do things. TAPs has nothing to offer for me other than ideas on how to model certain aspects of people's brains.

Comment by sen on Which areas of rationality are underexplored? - Discussion Thread · 2016-12-04T11:35:14.458Z · score: 0 (0 votes) · LW · GW

System 1 and 2 I don't think are relevant since they're not areas of rationality. It's the difference between a design and an implementation. I don't think this thread is about implementation optimizations, and I do see numerous threads on that topic.

Regarding double crux, I actually don't see that when I browse through the recent threads, even going back several pages. Through the site search, I was able to find another post that links to a November 29th thread, which I think is the one you're talking about.

Here's an excerpt from that double crux thread.

Ideally, B is a statement that is somewhat closer to reality than A—it's more concrete, grounded, well-defined, discoverable, etc. It's less about principles and summed-up, induced conclusions, and more of a glimpse into the structure that led to those conclusions.

(It doesn't have to be concrete and discoverable, though—often after finding B it's productive to start over in search of a C, and then a D, and then an E, and so forth, until you end up with something you can research or run an experiment on).

That's not out of context. The entire game description and recommendations are written with the focal point of increasing precision and making beliefs more concrete.

I want you to take the time to seriously consider whether you think I'm crazy for thinking that "increasing precision" and "making beliefs more concrete" could possibly be a bad thing when trying to understand how someone thinks. Think about what your gut reaction was when you read that. Think about what alternative there could be. Please don't read on until you're sure I'm just trolling so maybe you can see how screwed up this place this.

How about doing the exact opposite? How about making things less precise? How about throwing away useless structure and making it easier to reason by analogy, thereby letting people expose the full brunt of their intuition and experience that really leads to their beliefs? How about making beliefs less concrete, and therefore more abstract, more general, and easier to see relationships in other domains?

If you convince someone that A really might not lead to B and that there are n experiments you could use to tell, whoopee do, they are literally never going to use that again. If you discover that you believe uniforms lead to bullying because you mentally model social dynamics as particle systems, and bullying as a problem that occurs in high-chaos environments, and that uniforms go a long way in cooling the system thereby reducing the chaos and bullying... That's probably going to stick with you for a while, despite being a complete ungrounded non-sequitur.

Comment by sen on Making intentions concrete - Trigger-Action Planning · 2016-12-03T13:21:39.607Z · score: 1 (1 votes) · LW · GW

Reading through this, it seems completely obvious and intuitive, and yet I see a lot of "thumbs up" (or whatever the LW colloquialism is). For the sake of metaoptimization, I have to ask... Has this post actually helped anyone here? Reading through the comments, it seems again like everyone already knew this, and that people are just commenting with their own experiences. If this post didn't actually help anyone here, the obvious follow-up question would be whether the "thumbs up" signal is actually conveying the intended meaning.

Comment by sen on Open thread, Nov. 28 - Dec. 04, 2016 · 2016-12-03T11:23:05.808Z · score: 2 (2 votes) · LW · GW

You don't place bets based solely on probabilities. You place bets based on probabilities, odds, timescales, investments, and alternative options. Specifically, you place bets to optimize for growth of principle with respect to time. What you're doing is not placing a bet. If you were placing a bet and wanted feedback, it would have been appropriate to provide a lot more information, such as what you expect to gain from your bet, what you expect to lose in the negative case, what you're hoping to optimize, what your expected costs are, and what alternatives you're considering spending your time or money on. It's not appropriate for you to provide any of that information because what you're doing is not placing a bet.

What you're doing is panicking and looking for an echo to tell you that your beliefs are sensible, that the world really is crashing, and that what you're doing is justified. Your beliefs are not sensible, and the world isn't really crashing. I don't know if what you're doing is justified, as that would require a lot more information, but honestly I think that's irrelevant.

Comment by sen on Open thread, Nov. 28 - Dec. 04, 2016 · 2016-12-03T10:18:37.370Z · score: 2 (2 votes) · LW · GW

I said you sound insane because of your paranoia, not because of what you wanted to do as a result of that paranoia. Whether or not you would be creating backups in other circumstances is irrelevant, except as an indicator of how paranoid you are. I don't think such an indicator is necessary because your first two paragraphs already demonstrate what I see as an extreme level of paranoia, and so to me it's irrelevant whether you already have backups of various sites. It's perfectly reasonable for you to create backups given your beliefs. Those beliefs though I consider insane. The solution then is not to stop creating backups, as that would accomplish nothing. The solution is to stop browsing sites that are specifically designed to make you insane.

Comment by sen on Open thread, Nov. 28 - Dec. 04, 2016 · 2016-12-03T09:09:15.728Z · score: 2 (2 votes) · LW · GW

You sound insane desu.

Stop browsing reddit for a while. Any board where attention is explicitly rewarded, whether in the form of (You)s or upboats, will almost by definition tend towards encouraging high volatility of beliefs and emotions. It sounds like you've been riding that wave a bit too long.

Also, learn to recognize fear mongering.

Comment by sen on December 2016 Media Thread · 2016-12-03T08:36:14.822Z · score: 1 (1 votes) · LW · GW

Some short papers worth reading.

A good introduction to topos theory, which in turn explains why the Yoneda embedding so so useful. I would not recommend this as an introduction to category theory or the Yoneda lemma.

A love letter to adjoint functors discussing their meaning and philosophical significance.

A paper on categorification, from which I will include a quote below in hopes of showing some of you heathens the light.

If one studies categorification one soon discovers an amazing fact: many deep-sounding results in mathematics are just categorifications of facts we learned in high school! There is a good reason for this. All along, we have been unwittingly ‘decategorifying’ mathematics by pretending that categories are just sets. We ‘decategorify’ a category by forgetting about the morphisms and pretending that isomorphic objects are equal. We are left with a mere set: the set of isomorphism classes of objects.

To understand this, the following parable may be useful. Long ago, when shepherds wanted to see if two herds of sheep were isomorphic, they would look for an explicit isomorphism. In other words, they would line up both herds and try to match each sheep in one herd with a sheep in the other. But one day, along came a shepherd who invented categorification. She realized one could take each herd and ‘count’ it, setting up an isomorphism between it and some set of ‘numbers’, which were nonsense words like ‘one, two, three, . . . ’ specially designed for this purpose. By comparing the resulting numbers, she could show that two herds were isomorphic without explicitly establishing an isomorphism! In short, by decategorifying the category of finite sets, the set of natural numbers was invented.

According to this parable, decategorification started out as a stroke of mathematical genius. Only later did it become a matter of dumb habit, which we are now struggling to overcome by means of categorification. While the historical reality is far more complicated, categorification really has led to tremendous progress in mathematics during the 20th century. For example, Noether revolutionized algebraic topology by emphasizing the importance of homology groups. Previous work had focused on Betti numbers, which are just the dimensions of the rational homology groups. As with taking the cardinality of a set, taking the dimension of a vector space is a process of decategorification, since two vector spaces are isomorphic if and only if they have the same dimension. Noether noted that if we work with homology groups rather than Betti numbers, we can solve more problems, because we obtain invariants not only of spaces, but also of maps. In modern parlance, the nth rational homology is a functor defined on the category of topological spaces, while the nth Betti number is a mere function defined on the set of isomorphism classes of topological spaces. Of course, this way of stating Noether’s insight is anachronistic, since it came before category theory. Indeed, it was in Eilenberg and Mac Lane’s subsequent work on homology that category theory was born!

Comment by sen on Which areas of rationality are underexplored? - Discussion Thread · 2016-12-03T07:50:27.817Z · score: 9 (9 votes) · LW · GW

Non-bayesian reasoning. Seriously, pretty much everything here is about experimentation, conditional probabilities, and logical fallacies, and all of the above are derived from bayesian reasoning. Yes, these things are important, but there's more to science and modeling than learning to deal with uncertainty.

Take a look at the Wikipedia page on the Standard Model of particle physics, and count the number of times uncertainty and bayesian reasoning are mentioned. If your number is greater than zero, then they must have changed the page recently. Bayesian reasoning tells you what to expect given an existing set of beliefs. It doesn't tell you how to develop those underlying beliefs in the first place. For much of physics, that's pretty much squarely in the domain of group theory / symmetry. It's ironic that a group so heavily based on the sciences doesn't mention this at all.

Rationality is about more than empirical studies. It's about developing sensible models of the world. It's about conveying sensible models to people in ways that they'll understand them. It's about convincing people that your model is better than theirs, sometimes without having to do an experiment.

It's not like these things aren't well-studied. It's called math, and it's been studied for thousands of years. Everything on this site focuses on one tiny branch, and there's so much more out there.

Apologies for the rant. This has been bugging me for a while now. I tried to create a thread on this a little while ago and met with the karma limitation. I didn't want to deal with it at the time, and now it's all coming back to me, rage and all.

Also, this discussion topic is suboptimal if your aim is to explore new areas of rationality, as it presumes that all unexplored areas will arise from direct discussion. It should have been paired with the question "How do we discover underexplored areas of rationality?" My answer is to that is to encourage non-rational discussion where people believe, intuitively or otherwise, that it should be possible to make the discussion rational. You're not going to discover the boundaries of rationality by always staying within them. You need to look both outside and inside to see where the boundary might lie, and you need to understand non-rationality if you ever want hope of expanding the boundaries of rationality.

End rant.

Comment by sen on Mismatched Vocabularies · 2016-11-29T08:58:41.517Z · score: 0 (0 votes) · LW · GW

I don't think there's anything wrong with comparing 1 and 3. Yes, Reaction 1 is defined by an ideal, Reaction 2 is defined by a goal, and Reaction 3 is defined by an evolutionary impulse (whatever that means), but that does not make these things incomparable. If you have a goal in mind, you can determine the relationship between these three Reactions to the goal, and you can hope to come up with some ordering of Reactions with respect to that goal. For example, if you want to judge these reactions in terms of how well they indicate an individual's strive towards a goal that the reactor would consider worthwhile, I'd argue that Reaction 3 is, with high likelihood, a stupid option.

Regarding your sudden doubt in your own perspective, the problem here is that you didn't define a goal. By not defining a goal, your implicit goal was allowed to become something you didn't understand, and the reasoning behind your judgement was allowed to become highly subjective and non-conveyable. You can fix this particular issue by making sure to always think of "goodness" and "badness" in relation to explicitly specified goals. The more general issue is that you don't have a basis for recognizing when you believe things are reasonably comparable. You can fix this more general issue by studying more math, specifically category theory.

Regarding your actual question, try redefining your three reactions in terms of each of the three properties you used to define them: reaction from ideal, reaction from goal, reaction from evolutionary impulse (whatever that means). Under what ideal is it correct to ignore the mismatch and carry on the conversation? (I think korin43 answered most of this question.) Under what ideal is it correct to display anger? Towards what ends is it good to Google the unknown reference before responding? Towards what ends is it good to get angry?

As a side note, I've never seen evolutionary anything used as a concrete justification for a phenomenon where it couldn't equally well be used to justify the lack of the same phenomenon. More often than not, I see it as an attempt to hand-wave away a complex behavior because thinking is hard.

Comment by sen on Goal completion: noise, errors, bias, prejudice, preference and complexity · 2016-02-28T11:48:40.664Z · score: 0 (0 votes) · LW · GW

I see. You're treating "energy" as the information required to specify a model. Your analogy and your earlier posts make sense now.

Comment by sen on Goal completion: noise, errors, bias, prejudice, preference and complexity · 2016-02-25T06:21:32.002Z · score: 0 (0 votes) · LW · GW

2) I think this is the distinction you are trying to make between the lattice model and the smoker model: in the lattice model, the equations and parameters are defined, whereas in the smoker model, the equations and parameters have to be deduced. Is that right? If so, my previous posts were referring to the smoker-type model.

Your toy meta-model is consistent with what I was thinking when I used the word "model" in my previous comments.

3) I see what you're saying. If you add complexity to the model, you want to make sure that its improvement in ability is greater than the amount of complexity added. You want to make sure that the model isn't just "memorizing" the correct results, and that all model complexity comes with some benefit of generalizability.

I don't think temperature is the right analogy. What you want is to penalize a model that is too generally applicable. Here is a simple case:

simple case A one-hidden-layer feed-forward binary stochastic neural network the goal of which is to find binary-vector representations of its binary-vector inputs. It translates its input to an internal representation of length n, then translates that internal representation into some binary-vector output that is the same length as its input. The error function is the reconstruction error, measured as the KL-divergence from input to output.

The "complexity" you want is the length of its internal representation in unit bits since each element of the internal representation can retain at most one bit of information, and that bit can be arbitrarily reflected by the input. The information loss is the same as the reconstruction error in unit bits since that describes the probability of the model guessing correctly on a given input stream (assuming each bit is independent). Your criterion translates to "minimize reconstruction error + internal representation size", and this can be done by repeatedly increasing the size of the internal representation until adding one more element reduces reconstruction error by less than one bit.

Comment by sen on Goal completion: noise, errors, bias, prejudice, preference and complexity · 2016-02-24T02:05:28.105Z · score: 0 (0 votes) · LW · GW

I thought you were referring to degenerate gases when you mentioned nontrivial behavior in solid state systems since that is the most obvious case where you get behavior that cannot be easily explained by the "obvious" model (the canonical ensemble). If you were thinking of something else, I'm curious to know what it was.

I'm having a hard time parsing your suggestion. The "dropout" method introduces entropy to "the model itself" (the conditional probabilities in the model), but it seems that's not what you're suggesting. You can also introduce entropy to the inputs, which is another common thing to do during training to make the model more robust. There's no way to introduce 1 bit of entropy per "1 bit of information" contained in the input though since there's no way to measure the amount of information contained in the input without already having a model of the input. I think systematically injecting noise into the input based on a given model is not functionally different from injecting noise into the model itself, at least not in the ideal case where the noise is injected evenly.

You said that "if you add 1 bit of information, you have added 1 bit of entropy". I can't tell if you're equating the two phrases or if you're suggesting adding 1 bit of entropy for every 1 bit of information. In either case, I don't know what it means. Information and entropy are negations of one another, and the two have opposing effects on certainty-of-an-outcome. If you're equating the two, then I suspect you're referring to something specific that I'm not seeing. If you're suggesting adding entropy for a given amount of information, it may help if you explain which probabilities are impacted. To which probabilities would you suggest adding entropy, and which probabilities have information added to them?

Comment by sen on Goal completion: noise, errors, bias, prejudice, preference and complexity · 2016-02-23T04:23:08.877Z · score: 0 (0 votes) · LW · GW

It's true that the probability of a microstate is determined by energy and temperature, but the Maxwell-Boltzmann equation assumes that temperature is constant for all particles. Temperature is a distinguishing feature of two distributions, not of two particles within a distribution, and least-temperature is not a state that systems tend towards.

As an aside, the canonical ensemble that the Maxwell-Boltzmann distribution assumes is only applicable when a given state is exceedingly unlikely to be occupied by multiple particles. The strange behavior of condensed matter that I think you're referring to (Bose-Einstein condensates) is a consequence of this assumption being incorrect for bosons, where a stars-and-bars model is more appropriate.

It is not true that information theory requires the conservation of information. The Ising Model, for example, allows for particle systems with cycles of non-unity gain. This effectively means that it allows particles to act as amplifiers (or dampeners) of information, which is a clear violation of information conservation. This is the basis of critical phenomena, which is a widely accepted area of study within statistical mechanics.

I think you misunderstand how models are fit in practice. It is not standard practice to determine the absolute information content of input, then to relay that information to various explanators. The information content of input is determined relative to explanators. However, there are training methods that attempt to reduce the relative information transferred to explanators, and this practice is called regularization. The penalty-per-relative-bit approach is taken by a method called "dropout", where a random "cold" model is trained on each training sample, and the final model is a "heated" aggregate of the cold models. "Heating" here just means cutting the amount of information transferred from input to explanator by some fraction.

Comment by sen on Goal completion: noise, errors, bias, prejudice, preference and complexity · 2016-02-22T07:40:05.704Z · score: 0 (0 votes) · LW · GW

How is inverse temperature a penalty on models? If you're referring to the inverse temperature in the Maxwell-Boltzmann distribution, the temperature is considered a constant, and it gives the likelihood of a particle having a particular configuration, not the likelihood of a distribution.

Also, I'm not sure it's clear what you mean by "information to specify [a model]". Does a high inverse temperature mean a model requires more information, because it's more sensitive to small changes and therefore derives more information from them, or does it mean that the model requires less information, because it derives less information from inputs?

The entropy of the Maxwell-Boltzmann distribution I think is proportional to log-temperature, so high temperature (low sensitivity to inputs) is preferred if you go strictly by that. People that train neural networks generally do this as well to prevent overtraining, and they call it regularization.

If you are referring to the entropy of a model, you penalize a distribution for requiring more information by selecting the distribution that maximizes entropy subject to whatever invariants your model must abide by. This is typically done through the method of Lagrange multipliers.