Bayesian Collaborative Filtering

jgweissman

Bayesian Collaborative Filtering

post by JGWeissman · 2010-04-03T23:29:30.507Z · LW · GW · Legacy · 23 comments

23 comments

I present an algorithm I designed to predict which position a person would report for an issue on TakeOnIt, through Bayesian updates on the evidence of other people's positions on that issue. Additionally, I will point out some potential areas of improvement, in the hopes of inspiring others here to expand on this method.

For those not familiar with TakeOnIt, the basic idea is that there are issues, represented by yes/no questions, on which people can take the positions Agree (A), Mostly Agree (MA), Neutral (N), Mostly Disagree (MD), or Disagree (D). (There are two types of people tracked by TakeOnIt: users who register their own opinions, and Experts/Influencers whose opinions are derived from public quotations.)

The goal is to predict what issue a person S would take on a position, based on the positions registered by other people on that question. To do this, we will use Bayes' Theorem to update the probability that person S takes the position X on issue I, given that person T has taken position Y on issue I:

$P(S takes X on I | T takes Y on I) = P(S takes X on I)*P(T takes Y on I | S takes X on I)/P(T takes Y on I)$

Really, we will be updating on several people T_j taking positions T_y on I:

$P(S takes X on I | for all j, Tj takes Yj on I) = P(S takes X on I)*Product over j of (P(Tj takes Yj on I | S takes X on I)/P(Tj takes Yj on I))$

To compute this, let us first figure out the prior probability P(S takes X on I). I use for this a generalization of Laplace's Law of Succession (representing my theory that a person will take each position with a particular frequency, and that there is no reason, before seeing their actual position, to suppose that one position in particular is more frequent than the others), that the odds that S takes the position A : MA : N : MD : D on I is given by:

1 + count of issues S has taken position A on : 1 + count of issues S has taken position MA on : 1 + count of issues S has taken position N on : 1 + count of issues S has taken position MD on : 1 + count of issues S has taken position D on

Thus, the probability

$P(S takes X on I) = (1 + count of issues S has taken X on)/(5 + count of issues S has taken any position on)$

Likewise the probability

$P(Tj takes Yj on I) = (1 + count of issues Tj has taken Yj on)/(5 + count of issues Tj has taken any position on)$

This leaves one term in Bayes' Theorem to figure out: P(T_j takes Y_j on I | S takes X on I)

For this, I will again use the Generalized Laplace's Law of Succession, looking at issues on which both S and T_j have taken positions:

$P(Tj takes Yj on I | S takes X on I) = (1 + count of issues S takes X on and Tj takes Yj on)/(5 + count of issues S takes X on and Tj takes any position on)$

We now know how to compute, from the records of people's positions on issues, all the terms that Bayes' Theorem requires to compute the posterior probability that person S will take position X on issue I.

So, how well does this work? At this time, I have coded up a SQL script to be run against TakeOnIt's database, that predicts a user's positions based on the positions of Expert's/Influencers. (TakeOnIt stores positions for these types of users differently, which is why the first version doesn't just update on the positions of all people.) I have run this script to make predictions for myself, seeing as I am in a privileged position to judge the accuracy of those predictions. Looking at the predictions it made for me with greater than 80% confidence: Of the three predictions made with more than 90% confidence, all were correct, and of the 11 made with between 80% and 90% confidence, 10 were correct, and 1 was incorrect. From this limited data, it seems the algorithm is underconfident. I have registered my opinion on 40 issues.

In case you think my positions might be influenced by the positions, I have also looked at its retrodictions for positions I have already registered. It assigned 18% probability to my Neutral position on the issue Are successful entrepreneurs big risk takers?. My remaining 39 positions it predicted with confidence ranging from 60% to (due to round off errors on one issue) 100%.

Some areas for improvement:

This algorithm does not make any use of the structure of the possible positions. For example, Disagree is more like Mostly Disagree than Agree. And there is also symmetry such that that Agree relates to Mostly Agree in the same way that Disagree relates to Mostly Disagree. If you changed a question by adding negation, so that all the answers flipped, this algorithm would not necessarily give the flipped probability distribution. Of course, it is also possible that a person's position will not reflect the structure, so we should not completely impose it on the algorithm. But it could be an improvement to measure how well a person follows this structure (and how well people in general follow the structure), and adjust the results accordingly.

The algorithm has a violation of Conservation of Expected Evidence. When it is computing the probability that a person S will take position X on issue I, it has an expectation that person U will take position Z on issue I, which would alter its prediction for person S. But trying to extend the algorithm to recursively make predictions for U to use in its predictions for S would lead to infinite recursion.

23 comments

Comments sorted by top scores.

comment by BenAlbahari · 2010-04-04T04:41:44.461Z · LW(p) · GW(p)

I'm the founder of TakeOnIt, so let me add a little here.

JGWeissman's algorithm, or an evolution of it based on the feedback here, will be replacing the crude algorithm that's already online. You can see the results of that algorithm here:

Predicting Eliezer Yudkowsky's Opinions

If anyone else needs access to the database let me know.

Part of the vision here is to help people choose their beliefs in areas where they don't have domain expertise. This concept was described here.

In addition, this technique can also be used to detect correct contrarian opinions, as described here. The algorithm will predict that person S should believe in a minority opinion I if S has similar opinions to the set of people T, where T hold the opinion I.

Replies from: fburnaby, wnoise, JGWeissman

↑ comment by fburnaby · 2010-04-04T15:39:05.657Z · LW(p) · GW(p)

An extremely minor quibble with your site (I'm just trying to be helpful):

You ask for an opinion using a question: "Does god exist?", but then you allow people to provide answers as if they're agreeing or disagreeing with an affirmative statement (options: agree, neutral, disagree). The grammatical disconnect caused me some confusion when I first looked at the site. I think changing this to be either:

(1) "God exists" [agree, neutral disagree]

(2) "Does god exist?" [yes, maybe, no]

would make this more clear.

I haven't gotten to really read the above article yet, but do you think the proposed method would perform substantially better than a simple cluster analysis?

Replies from: BenAlbahari, JGWeissman

↑ comment by BenAlbahari · 2010-04-05T01:31:22.535Z · LW(p) · GW(p)

That's a good point. I've added it to the user voice feature suggestions:

http://takeonit.uservoice.com/forums/44305-general

↑ comment by JGWeissman · 2010-04-04T16:14:26.249Z · LW(p) · GW(p)

do you think the proposed method would perform substantially better than a simple cluster analysis?

Can you describe an algorithm that uses "simple cluster analysis" to predict what position a person S will take on an issue I?

Replies from: Johnicholas, fburnaby

↑ comment by Johnicholas · 2010-04-04T21:42:14.842Z · LW(p) · GW(p)

There's a system (I think maintained by NASA) called AutoClass, which is fairly easy to use. As I understand it, it accepts input "points" (who here would be people) and outputs clusters of similar points (people).

In order to predict using AutoClass, I think you would model unanswered questions as "missing values", and then predict based on observed frequencies from the same cluster.

There's some ad-hoc-ish-ness about the way AutoClass decides how many clusters there should be, but it's a solid, existing technology that has been used successfully in many applications.

Replies from: BenAlbahari

↑ comment by BenAlbahari · 2010-04-05T03:12:14.571Z · LW(p) · GW(p)

There's some ad-hoc-ish-ness about the way AutoClass decides how many clusters there should be...

If a collaborative filter algorithm is accurate, that's all that really matters to the consumers of the algorithm. It's primarily the designers of the algorithm who care about the scientific basis as to why the algorithm works.

A decent overview of the various CF algorithms:
http://www.hindawi.com/journals/aai/2009/421425.html

I find it both amusing and disturbing how pioneers in this field have been trying to optimize guessing movie preferences (recall the famous Netflix $1 million prize), when we can use these techniques to actually predict stuff that IMBO "really matters". It's yet another interesting data point as to what people really care about.

Replies from: orthonormal

↑ comment by orthonormal · 2010-04-07T00:21:59.877Z · LW(p) · GW(p)

Rather, it's a reminder that more effort is spent on projects that can be immediately profitable, however trivial they may be in scope. If there were a pay market for intellectual content as robust as the one for movies currently is, we'd have seen this done already. (Alas, I don't see how that could happen in the near future.)

Replies from: BenAlbahari, BenAlbahari

↑ comment by BenAlbahari · 2010-04-07T07:42:42.021Z · LW(p) · GW(p)

I've been considering the question: why has using Collaborative Filtering to predict the "trivial" opinions about movies been prioritized over predicting the "important" opinions about political/social/economic/etc. issues?

On reflection, I don't actually think it's because people care more about the former than the latter. Would you rather have a prediction regarding the opinion as to whether the movie Titanic was good, or a prediction regarding the opinion as to whether there's a housing bubble?

I think the answer is that opinions about products are naturally schematized, and hence easy to collate. Products are already tracked everywhere in databases, so it's pretty easy to extend that model to add opinions about those products. In contrast, opinions about issues, although often even more passionate than opinions about products, are not as naturally schematizable, hence they're harder to collate. Even in terms of representing the identity of an issue, it's not like we have the equivalent of an ISBN number for each issue. So opinions about issues are not adequately schematized and hence we can't collate those opinions into the nice big datasets we'd want to make predictions. Obviously websites like TakeOnIt are trying to change that. Each question ID is analogous to an ISBN number for that issue, if you will.

Yes, I agree with you that opinions of products can help sell products, so predicting opinions about products has the incentive of an immediately obvious monetization strategy. But if there's money to be made depending on correctly predicting the answer to a question, then the potential for monetization of the prediction of those opinions is also there.

↑ comment by BenAlbahari · 2010-04-07T01:12:22.657Z · LW(p) · GW(p)

Well there's a subset of questions on TakeOnIt where the correct answer has a financial reward/impact. An example of such a question was Is there a housing bubble in the United States?. These type of questions overlap with the kind of questions seen on prediction markets (which is a nice model for monetizing intellectual content). I'd be curious as to the relative accuracy between prediction markets and using collaborative filtering on expert predictions.

↑ comment by fburnaby · 2010-04-06T15:21:03.411Z · LW(p) · GW(p)

I had vaguely what Jonicholas mentions in mind.

↑ comment by wnoise · 2010-04-05T04:36:08.325Z · LW(p) · GW(p)

I've finally taken a look at this site. I'm strongly tempted to add H.P. Lovecraft quotes to many issues, but of course he's not an expert in the relevant senses, and the easily findable quotes are from fiction which should not generally be taken as his own.

Replies from: BenAlbahari, BenAlbahari

↑ comment by BenAlbahari · 2010-04-15T11:49:21.259Z · LW(p) · GW(p)

Based on your idea and the discussion that followed, I've added the feature to flag a quote as fictional.

On the question pages, fictional quotes are put in their own group:
Is information-theoretic death the most real interpretation of death?

On the expert pages, fictional quotes are flagged per quote:
H.P.Lovercraft's Opinions

Fictional quotes are discounted from the prediction analysis.

↑ comment by BenAlbahari · 2010-04-05T04:40:33.816Z · LW(p) · GW(p)

Thinking out loud (this could be a terrible idea, "green hat" thinking alert!): I wonder if it would be interesting to be able to tag a quote as "fiction". There's so many insightful quotes that are spoken through the fictional characters of great authors. It seems a shame that such quotes are "illegitimate". Better to perhaps allow the quotes but tag them appropriately so they can be filtered out of prediction analysis. Thoughts?

Replies from: khafra, wnoise, RobinZ

↑ comment by khafra · 2010-04-06T20:17:06.059Z · LW(p) · GW(p)

How would they be attributed? Valentine Michael's opinions are substantially different from Lazarus Long's.

Replies from: BenAlbahari

↑ comment by BenAlbahari · 2010-04-07T01:20:07.668Z · LW(p) · GW(p)

Simple model: a flag on a quote, present if it's a fictional character, with text preceding the quote explaining the source.

Complex model: Each fictional character is on par with an expert/influencer, with an extra field referencing back to the expert/influencer who's the author. E.g. you could look up all the quotes of "Sherlock Holmes" or all the fictional quotes of characters written by Arthur Conan Doyle.

↑ comment by wnoise · 2010-04-06T16:26:28.786Z · LW(p) · GW(p)

It seems worth trying, if you want to code it up. While it certainly doesn't make much sense to base predictions about others based on quite possibly incoherent groupings of characters, predicting the other way could be interesting.

But it does occur to me that I could just create user account and post them there, though that wouldn't let others add quotes.

↑ comment by RobinZ · 2010-04-05T11:37:19.032Z · LW(p) · GW(p)

I generally include the name of the character along with the rest.

Replies from: wnoise

↑ comment by wnoise · 2010-04-05T18:03:39.046Z · LW(p) · GW(p)

That doesn't always work -- sometimes it's from an impersonal narrator.

Replies from: RobinZ

↑ comment by RobinZ · 2010-04-05T19:29:49.521Z · LW(p) · GW(p)

Point - "Narrator", perhaps?

↑ comment by JGWeissman · 2010-04-04T05:08:59.678Z · LW(p) · GW(p)

Fun Fact: My algorithm's most confident prediction about Ben is that with 67% probability, he Disagrees on the issue Is the unconscious philosophical zombie possible?.

(Ben has registered his position on 7 issues.)

comment by Daniel_Burfoot · 2010-04-06T22:03:28.143Z · LW(p) · GW(p)

Your updating method is more properly called "Naive Bayes". Naive Bayes works well if the events you are conditioning on (Tj take Yj on I) are independent. It's not clear how reasonable that assumption is for your application.

To see the problem, let's say I agree with 80% probability with Eliezer and Robin. Now consider some new issue on which both agree. What's the probability I also agree? Naive Bayes predicts much higher than 80%; it effectively double-updates, one for both influences. But that's probably wrong - since Eliezer's opinions overlap strongly with Robin's, the conditioning events are not independent.

To do better, you need a method that can capture this kind of conditional independence effect (my opinion is probably independent of Eliezer's given Robin's). I would try a boosting method like AdaBoost.

As a meta-level note, you shouldn't imagine that any one technique is "correct" and some other incorrect. The only criterion you should use is empirical performance. Specifically, you can calculate the negative log likelihood (compression rate) each method achieves for the database and select the one that provides the lowest number.

Replies from: JGWeissman

↑ comment by JGWeissman · 2010-04-07T03:57:22.249Z · LW(p) · GW(p)

One way of looking at this problem, that fits well with an expansion I am trying to figure out, is that when I know Robin's position on the question, I should update my probability distribution for Eliezer's position, which should affect the likelihood ratios for Eliezer's observed position with respect to the positions of the person whose opinion is being predicted.

comment by JGWeissman · 2010-04-07T04:04:51.166Z · LW(p) · GW(p)

One feature of this method, which clustering does not have, is that if person S consistently Disagrees on issues that person T Mostly Disagrees on, then when person T Mostly Disagrees on an issue, it is evidence that person S will Disagree on that same issue.

Bayesian Collaborative Filtering

Contents

23 comments