Posts
Comments
The web game greatly suffers from the network effect. There's just very little chance you'll get >=3 people to log on simultaneously, and of course, because of this people will give up on trying, worsening the effect.
Maybe we can designate, say, 12:00 AM and PM, UTC, as hours at which people should log on? This will make it easier to reach critical mass.
Proving a program to be secure down from applying Schrödinger's equation on the quarks and electrons the computer is made of is way beyond our current abilities, and will remain so for a very long time.
Challenge accepted! We can do this, we just need some help from a provably friendly artificial superintelligence! Oh wait...
I've written a post on my blog covering some aspects of AGI and FAI.
It probably has nothing new for most people here, but could still be interesting.
I'll be happy for feedback - in particular, I can't remember if my analogy with flight is something I came up with or heard here long ago. Will be happy to hear if it's novel, and if it's any good.
How many hardware engineers does it take to develop an artificial general intelligence?
The perverse incentive to become alcoholic or obese can be easily countered with a simple rule - a person chosen in the lottery is sacrificed no matter what, even if he doesn't actually have viable organs.
To be truly effective, the system needs to consider the fact that some people are exceptional and can contribute to saving lives much more effectively than by scrapping and harvesting for spare parts. Hence, there should actually be an offer to anyone who loses the lottery, either pay $X or be harvested.
A further optimization is a monetary compensation to (the inheritors of) people who are selected, proportional to the value of the harvested organs. This reduces the overall individual risk, and gives people a reason to stay healthy even more than normally.
All of this is in the LCPW, of course. In the real world, I'm not sure there is enough demand for organs that the system would be effective in scale. Also, note that a key piece of the original dilemma is that the traveler has no family - in this case, the cost of sacrifice is trivial compared to someone who has people that care about him.
Of course. I'm not recommending to any genes to have their host go celibate. I just disagree with the deduction "if you're ceilbate you can't have children, so there's no way your genes could benefit from it, QED".
If your own celibacy somehow helps your relatives (who have a partial copy of your genes) reproduce, then the needs of your genes have been served. In general, genes have ways to pursue their agenda other than have their host reproduce. Sometimes genes even kill their host in an attempt to help copies of themselves in other hosts.
Is the link to "Logical disjunction" intentional?
I was born in May, and I approve this message.
Ouch, I completely forgot about this (or maybe I never knew about it?), and that's a talk I wanted to hear...
Is it possible perhaps to get it in text form?
It's worth mentioning that EFF has resumed accepting Bitcoin donations a while ago.
I don't suppose it's possible to view the version history of the post, so can you state for posterity what "DOCI" used to stand for?
I think some factor for decreasing votes over time should be included. Exponentially decaying rates seem reasonable, and the decay time constant can be calibrated with the overall data in the domain (assuming we have data on voting times available).
I think it's reasonable to model this as a Poisson process. There are many people who could in theory vote, only few of them do, at random times.
Given that a and b are arbitrary, I think the differences can be large. Whether they actually are large for typical datasets I can't readily answer.
In any case the advantages are:
Simplicity. Tuning the parameters is a bit involved, but once you do the formula to apply for each item is very simple. In many (not all) cases, a complicated formula reflects insufficient understanding of the problem.
Motivation. Taking the lower bound of a confidence/credible interval makes some sense but it's not that obvious. The need for it arises because we don't model the prior mean, so we don't want to take risk on unproven items. A posterior mean of the quality is more natural, and won't cause much problems because items default to the true population mean.
Parametrization. The interval methods has a parameter for the probability to take for the size of the interval, but it's not at all clear how to choose it. My method has parameters for mean and variance which are based on the data.
Generalization. This framework makes it easier to clearly think about what we want, and replace the posterior mean of p with a posterior mean of some other quantity of interest. e.g., the suggested "explore vs. exploit" tends to give something closer to an interval upper bound than lower bound, and other methods have been suggested.
True. This is a problem since the current net vote count is mutable, while an individual vote, once cast, is not. You could try fitting a much more complicated model that can reproduce this behavior, calibrate it with A/B testing, etc. Or maybe try to prevent it by sorting according to quality, but not actually displaying the metrics.
But I fear that it would cause irreparable damage if the world settles on this solution.
This is probably vastly exaggerating the possible consequences; it's just a method of sorting, and either the Wilson's interval method and a Bayesian method are definitely far better than the naive methods.
I just feel that it will place this low-hanging fruit out of reach. e.g.,
Me: Hey Reddit, I have this cool new sorting method for you to try!
Reddit: What do you mean? We've already moved beyond the naive methods into the correct method. Here, see Miller's paper. No further changes are needed.
Maybe I'm exaggerating - I mean, things can be improved again after being improved once - but I just feel that if the world had a "naive rating method" itch to scratch, and something like Miller's method became the go-to method, something is wrong.
It means that the model used per item doesn't have enough parameters to encode what we know about the specific domain (where domain is "Reddit comments", "Urban dictionary definitions", etc.)
The formulas discussed define a certain mapping between pairs (positive votes, negative votes) to a quality score. In Miller's model, the same mapping is used everywhere without consideration of the characteristics of the specific domain. In my model, there are parameters a and b (or alternatively, a/(a+b) and a+b) that we first train per-domain, and then apply per item.
For example, let's say you want to decide the order of a (5, 0) item and a (40, 10) item. Miller's model just gives one answer. My model gives different answers depending on:
The average quality - if the overall item quality is high (say, most items have 100% positive votes), the (5,0) item should be higher because it's likely one of those 100% items, while (40,10) has proven itself to be of lower quality. If, however, most items have low quality, (40,10) will be higher because it has proven itself to be one of the rare high-quality items, while (5,0) is more likely to be a low-quality item which lucked out.
The variance in quality - say the average quality is 50%. If the variance in quality is low, (5,0) will be lower because it is likely to be an average item which lucked out, while (40, 10) has proven to be of high quality. If the variance is high (with most items being either 100% or 0%), (5,0) will be higher because in all likelihood it is one of the 100% items, while (40, 10) has proven to be only 80%.
In short, using a cookie-cutter model without any domain-specific parameters doesn't make the most efficient use of the data possible.
This is interesting, especially considering that it favors low-data items, as opposed to both the confidence-interval-lower-bound and the notability adjustment factor, which penalize low-data items.
You can try to optimize it in an explore-vs-exploit framework, but there would be a lot of modeling parameters, and additional kinds of data will need to be considered. Specifically, a measure of how many of those who viewed the item bothered to vote at all. Some comments will not get any votes simply because they are not that interesting; so if you keep placing them on top hoping to learn more about them, you'll end up with very few total votes because you show people things they don't care about.
The beta distribution is a conjugate prior for Bernoulli trials, so if you start with such a prior the posterior is also beta, which greatly simplifies the calculations. It also converges to normal for large alpha and beta, and in any case can be fit into any mean and variance, so it's a good choice.
Whatever your target function is, you'll want the item with the greatest posterior mean for this target. To do this generally you'll need the posterior distribution of p rather than the mean of p itself. But the distribution just describes what you know about p, it doesn't itself encode properties such as "controversial".
Well, I think there is some sense of Bayesianism as a meta-approach, without regard to specific methods, which most of us would consider healthier than the frequentist mindset.
There are surely papers showing the superiority of frequentism over Bayesianism, and papers showing the differences between various flavors of Bayesianism and various flavors of frequentism. But that's not what I'm after right now (with the understanding that a paper can be on the "Bayesian" side and be correct).
(Link to How Not To Sort By Average Rating.)
I forgot to link in the OP. Then remembered, and forgot again.
Something of interest: Jeffery's interval. Using the lower bound of a credible interval based on that distribution (which is the same as yours) will probably give better results than just using the mean: it handles small sample sizes more gracefully. (I think, but I'm certainly willing to be corrected.)
This seems to use specific parameters for the beta distribution. In the model I describe, the parameters are tailored per domain. This is actually an important distinction.
I think using the lower bound of an interval makes every item "guilty until proven innocent" - with no data we assume the item is of low quality. In my method we give the mean quality of all items (and it is important we calibrate the parameters for the domain). Which is better is debatable.
In the notation of that post, I'd say I am interested mostly in the argument over "Whether a Bayesian or frequentist algorithm is better suited to solving a particular problem", generalized over a wide range of problems. And the sort of frequentism I have in mind seems to be "frequentist guarantee" - the process of taking data and making inferences from it on some quantity of interest, and the importance to be given to guarantees on the process.
Would it? Maybe the question (in its current form) isn't good, but I think there are good answers for it. Those answers should be prominently searchable.
Except it's not really a prediction market. You could know the exact probability of an event happening, which is different from the market's opinion, and still not be able to guarantee profit (on average).
but the blue strategy aims to maximize the frequency of somewhat positive responses while the red strategy aims to maximize the frequency of highly positive responses.
It's the other way around.
I guess the Umesh principle applies. If you never have to throw food away, you're preparing too little.
If you haven't already, you can try deepbit.net. I did, and it's working nicely so far.
Thanks, will do.
do you know that group? do you want their contact info?
No, and no need - I trust I'll find them should the need arise.
I'm interested in being there, but that's a pretty long drive for me. Is there any chance to make it in Tel-Aviv instead?
At the risk of stating the obvious: The information content of a datum is its surprisal, the logarithm of the prior probability that it is true. If I currently give 1% chance that the cat in the box is dead, discovering that it is dead gives me 6.64 bits of information.
Eliezer Yudkowski can solve EXPTIME-complete problems in polynomial time.
Sorry, I'm not sure I know how to answer that.
Just in case anyone didn't get the joke (rot13):
Gur novyvgl gb qvivqr ol mreb vf pbzzbayl nggevohgrq gb Puhpx Abeevf, naq n fvathynevgl, n gbcvp bs vagrerfg gb RL, vf nyfb n zngurzngvpny grez eryngrq gb qvivfvba ol mreb (uggc://ra.jvxvcrqvn.bet/jvxv/Zngurzngvpny_fvathynevgl).
When Eliezer Yudkowsky divides by zero, he gets a singularity.
Now that I've looked it up, I don't think it really has the same intuitions behind it as mixed strategy NE. But it does have an interesting connection with swings. If you try to push a heavy pendulum one way, you won't get very far. Trying the other way you'll also be out of luck. But if you push and pull alternately at the right frequency, you will obtain an impressive amplitude and height. Maybe it is because I've had firsthand experience with this that I don't find Parrondo's paradox all that puzzling.
Okay, then it looks like we are in agreement.
I'll consider it, But I don't know if I'm the right person for that, or if I'll have the time.
Short answer: I already addressed this. Is your point that I didn't emphasize it enough?
One thing should be kept in mind. A Nash equilibrium strategy, much like a minimax strategy, is "safe". It makes sure your expected payoff won't be too low no matter how clever your opponent is. But what if you don't want to be safe - what if you want to win? If you have good reason to believe you are smarter than your opponent, that he will play a non-equilibrium strategy you'll be able to predict, then go ahead and counter that strategy. Nash equilibria are for smart people facing smarter people.
Long answer:
The distribution of sword/armor choices in an MMO will not be the Nash equilibrium with overwhelming probability. In fact, it probably won't be anywhere close if the choice is at all complicated.
Correct. Do you know what the distribution is? Can you gather statistics? Do you understand the mentality of your opponents so well that you can predict their actions? Can you put the game in some reference class and generalize from that? If any of the above, knock yourself out. Otherwise there's no justification to use anything but the equilibrium.
If you are playing an MMO with random pvp, populated by people the people who play MMOs, and you choose the Nash equilibrium, you will probably do worse than me.
If you assume I will continue to use the NE, even after collecting sufficient statistics to show that the players follow some specific distribution (as opposed to "not Nash equilibrium"), then this is a strawman. If you're just saying you're very good at understanding the mind of the MMOer then that's likely, but doesn't have much to do with the post.
But saying that someone who talks about what their opponents are likely to do is "wrong" is itself quite wrong.
I did not criticize the rare swords&armor posts that actually tried to profile their opponents and predict their actions. I criticized the posts that tried to do some fancy math to arrive at the "optimal" solution inherent to the game, and then failed to either acknowledge or reconcile the fact that their solution is unstable.
Reasoning about the distribution of strategies your opponents play is the correct approach to this problem, not reasoning about the Nash equilibrium.
One should first learn how to do Nash equilibrium, then learn how to not do Nash equilibrium. NE is the baseline default choice. If someone doesn't understand the problem well enough to realize that any suggested deterministic strategy will be unstable and that the NE is the answer to that, what hope does he have with the harder problem of reasoning about the distribution of opponents?
If 90% of American generals in the past have chosen to attack West, it is probably wrong to defend East with the "optimal" probability.
Even under the strong assumption that there is some clear reference class about which you can collect these statistics, this is true only if one of the following conditions holds:
- The attacker hasn't heard about those statistics.
- The attacker is stupid.
- The attacker is plagued with the same inside-view biases that made all his predecessors attack west.
The plausibility of each depends on the exact setting. If one holds, and we know it, and we really do expect to be attacked west with 90% probability, then this is no longer a two-player game. It's a one-player decision theory problem where we should take the action that maximizes our expected utility, based on the probability of each contingency.
But if all fail, we have the same problem all over again. The attacker will think, "paulfchristiano will look at the statistics and defend west! I'll go east". How many levels of recursion are "correct"?
Fixed.
Thanks. I'll move it as soon as I have the required karma.
I don't know if I qualify as a LW-er, but I'm in Israel. I'll be happy to meet you two if you are interested. I'm in Tel-Aviv every day.
Gambit said the only equilibrium was mixed, with 1/5 each of (blue sword, blue armor), (blue sword, green armor), (yellow sword, yellow armor), (green sword, yellow armor), and (green sword, green armor).
FWIW, my calculations confirm this - you beat me to posting. One nitpick - this is not the only equilibrium, you can transfer weight from (blue, green) to (red, green) up to 10%.
But if you can't look at the current distribution, you still need to use the equilibrium for this single choice. Otherwise, you're at risk that everyone will think the same as you, except for a few smarter players who will counter it.
I actually briefly considered mentioning correlated equilibria, but the post was getting long already.
You can't have your cake and eat it too. If the probability is low enough, or the penalty mild enough, that the rational action is to take the gamble, then necessarily the expected utility will be positive.
Taking your driving example, if I evaluate a day of work as 100 utilons, my life as 10MU, and estimate the probability to die while driving to work as 1/M, then driving to work has an expected gain of 90U.
What exactly does maximizing expected utility yield in these particular cases?
For one, I could be convinced not to take A (0.01 could be too risky) but I would never take B.
Depends on how much money you currently have. According to the simple logarithmic model, you should take gamble B if your net worth is at least $2.8M.
Suppose someone offers you a (single trial) gamble C in which you stand to gain a nickel with probability 0.95 and stand to lose an arm and a leg with probability 0.05. Even though expectation is (-0.05arm -0.05leg + 0.95nickel), you should still take the gamble since the probability of winning on a single trial is very high - 0.95 to be exact.
Non-sarcastic version: Losing $100M is much worse than gaining $100K is good, regardless of utility of money being nonlinear. This is something you must consider, rather than looking at just the probabilities - so you shouldn't take gamble A. This is easier to see if you formulate the problems with gains and losses you can actually visualize.
Hi.
I intend to become more active in the future, at which point I will introduce myself.