An even more modest search engine proposal

post by HalMorris · 2014-07-26T02:42:35.955Z · LW · GW · Legacy · 7 comments

How much AI technique could it possibly take for google (or something better) to do a decent job with

speechby:obama   attitude:positive   "Saul Alinsky".

I.e. "speechby:" and "attitude:" don't exist, but could, I believe be implemented pretty accurately, to see in this case if we can find any instances of Obama praising Saul Alinsky.

claims such quotes exist, but their one attempt to demonstrate it is laughable -- something vaguely like a paraphrase of an Alinsky statement, but which has, in fact the reverse sense of what the supposed "original" meant.  Yet I think most of the world, and not just conservatives, if they have any idea who Alinsky is, will tend not to question Obama's "debt" to Alinski -- just for the sheer number of times it's been said or implied.  For the other shoe dropping, false quotes that help demonize Alinsky, see .

The point isn't to defend Obama.  It is that I think the world would work better if the ratio of

         ability to find verifiable facts pertinent to political discussion


          supply of highly opinionated and slanted "news". 

 

could be raised by, say, an order of magnitude.

So many assertions are made that are likely not true, but are incredibly difficult for the average person to disprove.  In this Internet era, the personal cost to write some almost free associative screed about a political point is very low, while the personal cost of finding quite a lot of pertinent facts is awfully high.

This is not to say the "average person" will look for facts to confirm or contradict what they read, but much of what they read is written by bloggers some of whom are sincere and would become users of such resources, and I do believe the emotional rewards of finding a nugget of truth versus the current pain of often fruitless search would have an effect on people's thinking habits -- maybe small at first but growing over time.

The particular proposal merely illustrates one of many sorts of resource that are missing or hard to find.  Ideas for other such resources would be welcome.

7 comments

Comments sorted by top scores.

comment by Punoxysm · 2014-07-26T03:04:38.446Z · LW(p) · GW(p)

Take articles mentioning Obama and Alinsky

Extract quotes by Obama: sometimes somewhat difficult, depending on article structure. In Obama's case, simply due to the volume of his speeches, it would be easy to extract a large high-confidence (but incomplete) corpus.

Find sentences within 2 words of references to Alinsky. Alternately, use plagiarism-detecting software to detect near-quotes (not plagiarized necessarily; this is a known application of such software) of Alinsky in Obama's speeches.

Apply known sentiment analysis techniques; probably insufficient due to the way political speech, compared to restaurant and product reviews, is structured.

Use a human to take as many of the top candidate quotes as possible and manually look over them. Still a lot easier than looking over the whole set of Obama related speeches and articles.

So this is easily done, just not easily done at scale with a couple specific barriers. Quote attribution and summarizing context of a quote, allusion or reference are probably the two biggest technical barriers.

Replies from: HalMorris
comment by HalMorris · 2014-07-26T03:36:36.827Z · LW(p) · GW(p)

I do appreciate that.

But I'm really interested in resources so easy as to seduce the blogger to whom all of that would be Greek.

I am interested, and would like to find others who are interested, in finding modest ways to make the electorate more rational, which I think is really in our best interest -- not just to make ourselves super-rational Bayesian black belts and all that, as valid a pursuit as that is.

Replies from: Punoxysm
comment by Punoxysm · 2014-07-26T22:32:20.646Z · LW(p) · GW(p)

I think just a really excellent searchable quotebank for politicians, eventually with some degree of easy cross-referencing or easy "find similar quotes" would be the place to start.

But how much that really elevates discourse is debatable.

comment by ChristianKl · 2014-07-27T23:44:20.524Z · LW(p) · GW(p)

speechby would also be very interesting for searching forums like lesswrong to know what a specific poster said about a subject in the past.

Replies from: HalMorris
comment by HalMorris · 2014-07-28T02:25:07.803Z · LW(p) · GW(p)

It's easy on Lesswrong in particular by just clicking on your name link, so I can see you've been pitching in on the thread "Another "LessWrongers are crazy" article - this time on Slate", and I think the record goes back years -- probably since you've been a member. If you mean other forums (fora?) opaque to us, maybe -- I'm sure the NSA is doing that, compiling profiles of what people say across different forums, so far as the IDs can be matched up. However

1) What I have in mind would mine credible sources for quotes of public figures.

2) I intended it more as an illustration of the main point, i.e.: I think the world would work better if the ratio of

     ability to find verifiable facts pertinent to political discussion
    _________________________________________________
      supply of highly opinionated and slanted "news". 

could be raised by, say, an order of magnitude.

A major component of "ability to find" would be ease of finding it. Most people follow some approximation of the path of least resistance when "surfing" the net.

The wikimedia people might encompass it eventually -- for now they collect "quotable quotes" and don't try for a comprehensive set of speakings/writings (copyright issues there although something like what google books does might work -- so a database of pointers to semi-accessible information sources.

Another more ambitious idea would be design of a sort of protocol object for tabular statistical data, that would stand for some sort of "standard of respectability", so that when someone presented a summary graph excerpted from some source, the "proper" way to do it would be as a defined operation (in part maybe a sort of "SQL Lite") on the protocol object, but anyone reading the article could pull the object out of that context and turn it this way and that so they stuck with the slanted and cherry-picked print-bite.

Replies from: ChristianKl, ChristianKl
comment by ChristianKl · 2014-07-28T10:37:19.957Z · LW(p) · GW(p)

Judging politicians by their quotes is a bad idea anyway. The quotes of a political speech are what the speech writer of the politician thought the audience wants to hear.

I find websites such as factcheck to be a lot more valuable for the political discourse.

comment by ChristianKl · 2014-07-28T10:35:24.962Z · LW(p) · GW(p)

It's easy on Lesswrong in particular by just clicking on your name link, so I can see you've been pitching in on the thread "Another "LessWrongers are crazy" article - this time on Slate", and I think the record goes back years -- probably since you've been a member. If

It easy to get a list of all posts. It's not easy to automatically search with them in a way that distinguishes between a person having said something himself and a person who responds to them having said something.