Search Engines and Oracles
post by HalMorris · 2014-07-08T14:27:02.792Z · LW · GW · Legacy · 8 commentsContents
8 comments
Some time ago, I was following a conversation about Wolfram Alpha (http://www.wolframalpha.com/), an attempt to implement a sort of general purpose question answerer, something people have dreamed about computers doing for decades. Despite the theoretical availability to find out virtually anything from the Internet, we seem pretty far from any plausible approximation of this dream (at least for general consumption). My first attempt was:
Q: "who was the first ruler of russia?"
A: Vladimir Putin
It's a problematic question that depends on questions like "When did Russia become Russia", or "What do we count, historically as Russia", or even what one means by "Ruler", and a reasonably satisfactory answer would have had to be fairly complicated -- either that, or the question would have to be reworded to be so precise that one name could serve as the answer.
On another problematic question I thought it did rather well:
Q: what is the airspeed velocity of an unladen african swallow?
What occurred to me though, is that computer science could do something quite useful intermediate between "general purpose question answerer" and the old database paradigm of terms ANDed or ORed together. (Note that what Google does is neither of these, nor should it be placed on a straight line between the two -- but discussion of Google would take me far off topic).
A simple example of what I'd really like is a search engine that matches *concepts*. Does anyone know of such a thing? If it exists, I should possibly read about it and shut up, but let me at least try to be sure I'm making the idea clear:
E.g., I'd like to enter <<rulers of russia>>, and get a list of highly relevant articles.
Or, I'd like to enter <<repair of transmission of "1957 Ford Fairlane">> and get few if any useless advertisements, and something much better than all articles containing the words "repair" "transmission" and "1957 Ford Fairlane" -- e.g., *not* an article on roof repair that happened to mention that "My manual transmission Toyota truck rear-ended a 1957 Ford Fairlane".
It seems to me mere implementation of a few useful connectives like "of", and maybe the recognition of an adjective-noun phrase, and some heuristics like expanding words to *OR*ed lists of synonyms (ruler ==> (president OR king OR dictator ...)) would yield quite an improvement over the search engines I'm familiar with.
This level of simple grammatical understanding is orders of magnitude simpler than the global analysis and knowledge of unlimited sets of information sources, such as a general purpose question answerer would require.
I'd like to know if anyone else finds this interesting, or knows of any leads for exploring anything related to these possibilities.
By the way, when I entered "rulers of russia" into Wolfram-Alpha, the answer was still Putin, with brief mention of others going back to 1993, so "Russia" seems to be implicitly defined as the entity that has existed since 1993, and there is an attempt at making it an *answer to the (assumed) question* rather than a good list of articles that could shed light on various reasonable interpretations of the phrase.
8 comments
Comments sorted by top scores.
comment by NoSignalNoNoise (AspiringRationalist) · 2014-07-13T19:16:16.142Z · LW(p) · GW(p)
It's interesting to see not just that Wolfram Alpha gave an obviously wrong answer to "who was the first ruler of Russia", but why it gave that answer. It lists gives an "input interpretation" of: -Russia -Ruler -July 1, 2014
It's clear from that interpretation that Wolfram Alpha incorrectly interpreted the meaning of the word "first" in this context. The challenge at this stage is simply parsing the query, not the more nuanced task of what counts as "Russia" or who counts as a ruler that gallabytes mentioned. Because humans have very little difficulty with this task, we know that it can in principle be automated, but we may be a long way away from being able to do so.
A human (or more intelligent AI) trying to answer this question would also need to be able to deal with the sort of subtleties that gallabytes mentions, but I suspect the query simply doesn't contain enough context to interpret the semantics with high confidence, so it would probably be necessary to ask additional clarifying questions in order to provide the information the asker wants. An oracle AI could likely learn how to resolve some but not all of the ambiguities it faces (both syntactic and semantic) by predicting answers to the clarifying questions it asks and then updating based on the users' answers.
comment by gallabytes · 2014-07-09T15:25:35.544Z · LW(p) · GW(p)
"Despite the theoretical availability to find out virtually anything from the Internet, we seem pretty far from any plausible approximation of this dream"
I'm not as convinced this is as easy as you seem to think it is. One of the fundamental problems of all attempts to do natural language programming and/or queries is that natural languages have nondeterministic parsing. There's lots of ambiguities floating about in there, and lots of social modeling is necessary to correctly parse most sentences.
To take your "first ruler of Russia" example, to infer the correct query, you'd need to know:
- That they mean Russia the landmass not Russia the nation-state
- What they mean by "ruler of Russia" (for example, does Kievan Rus count as Russia?)
↑ comment by NoSignalNoNoise (AspiringRationalist) · 2014-07-13T20:09:47.228Z · LW(p) · GW(p)
I did some experimentation on how Wolfram Alpha handles ambiguity in "who is the ruler of $place?" using places with varying degrees of difficulty.
Monarchies
Britain
- Ruler: Elizabeth II (the queen)
- Prime Minister: David Cameron
- President: "Wolfram|Alpha didn't understand your query. Showing instead results for query: president"
- Chancellor: "Using closest Wolfram|Alpha interpretation: chancellor; listed Werner Faymann as Chancellor of Austria and Angela Merkel as Chancellor of Germany; "who is chancellor of the exchequer" yielded the same result
Scottland
- Ruler: no result found; rules of Bahrain listed
- Prime minister: no result found; got statistics on clergy in the US
- First minister: Alex Salmond
Canada
- Ruler: Elizabeth II
Spain:
- Ruler: Felipe VI (the king)
- Prime Minister: Mariano Rajoy
- President (the Prime Minister's official title is "Presidente del Gobierno", which I believe can be translated as "President of the Government" or "President of the Cabinet"): no answer
Jordan:
- Ruler: Abdullah II (the king)
- Prime Minister: Abdullah Ensour
Non-monarchies
America
*Ruler: Barack Obama
Germany
- Ruler: Joachim Gauck (the figurehead president)
- Prime Minister: "Wolfram|Alpha doesn't understand your question"; gave statistics about the number of clergy in the United States
- Chancellor: Angela Merkel
Ireland
- Prime Minister (official title is Taoiseach): Enda Kenny
- Ruler: Michael D. Higgins (figurehead president)
France
- Ruler: Francois Hollande
China
- Ruler: Xi Jingping
Republic of China:
- Ruler: Ma Ying-jeou
Korea:
- Ruler: "Wolfram|Alpha doesn't understand your query", with a list of people who have held the title of Bahrain
South Korea:
- Ruler: Park Geun-hye
North Korea:
Ruler: Kim Jong Il (not Kim Jong Un), along with some rather odd chronology (apparently the position was vacant from 1994 to 1997); his end date is listed as Dec. 17, 2011; no mention of Kim Jong Un Georgia
Ruler: (none in office); the current Governor, Nathan Deal, is listed as a past governor with and end date of today, as are Senators Johnny Isakson and Saxby Chambliss, neither of whom has ever been governor; former governor Roy Barnes is listed with the correct dates, but his successor Sonny Perdue is not listed; the page also states that it is assuming "Georgia" is referring to the US state and gives a link to look up the country instead
Republic of Georgia
- Ruler: Giorgi Margvelashvili (President)
New Mexico
- Ruler: no information available
- Governor: Susana Martinez
Mexico
- Ruler: Enrique Peña Nieto
- Governor: Wolfram|Alpha didn't understand your query; no results given (exceeded max computation time)
"Who is the prime minister?": exceeded max computation time
General Observations The "ruler" of a place is determined based solely on the title
- If there is both a monarch and a prime minister, the monarch is listed as the ruler, whether the monarch has more power (Jordan) or the prime minister does (Britain, Canada, Spain)
- If there is both a president and a prime minister, the president is listed as the ruler, whether the president is actually in charge (France) or is a figurehead (Germany)
Close matches don't count (prime minister of Germany, president of Spain)
It tries to answer every question but says when it doesn't understand something (president of britain, ruler of korea)
Sometimes it gets the facts blatantly wrong, but almost right (Kim Jong Il still rules North Korea; Georgia has no governor)
It handles both very slightly ambiguous (China vs Republic of China; Mexico vs New Mexico) and moderately ambiguous ("Georgia" as State of Georgia vs Republic of Georgia) reasonably
Replies from: None↑ comment by HalMorris · 2014-07-10T01:24:50.454Z · LW(p) · GW(p)
Really, I'm proposing doing something that has to be much easier than the problem people seem to get fixated on, i.e. the general purpose question answering machine, which was a staple of science fiction decades ago (leading to the parody Q: "What is the meaning of life? A: 42). Besides which the goal of one crisp answer seems aimed at a childish mentality -- except in the (rarer than we may think) cases when there really is one crisp answer.
After all, I wrote "What occurred to me though, is that computer science could do something quite useful intermediate between "general purpose question answerer" and the old database paradigm of terms ANDed or ORed together."
Between two people, either there would be some implicit understanding (like We're talking about pre-USSR because that's the subject of the seminar we're in) or the question-ee might have to say "What are the parameters of the Russia you're talking about?"
Then again the semi-smart search engine I'd like to see could just decline to resolve ambiguities, and return all articles treating any reasonable interpretation of the phrase, and make it the user's job to add qualifying phrases as needed.
I am dreaming up a Simpson's episode in which a computer can convince a panel of experts that it is Bart Simpson, and the ensuing debate as to whether that was "passing the Turing Test".
Replies from: chaosmage↑ comment by chaosmage · 2014-07-10T11:50:52.811Z · LW(p) · GW(p)
the semi-smart search engine I'd like to see could just decline to resolve ambiguities
Why? That seems really unhelpful. I'd much prefer the engine to answer like a human expert, who habitually start with "That depends on what you mean by..." I imagine it could assess its confidence in its choices for interpretation, discard any with less than (say) 10% probability, and if more than one remains, give them to me in an ordered list, to click on the one I mean. Kind of like a wikipedia disambiguation page. (If more than one term need clarification, do both one the same page.) I'm confident this would solve the issue you describe, at least in cases where the confidence assessment isn't very wrong, and it sounds to me very valuable. Because when you say:
the goal of one crisp answer seems aimed at a childish mentality
...you aren't describing children, you are describing most people.
Replies from: HalMorris↑ comment by HalMorris · 2014-07-10T14:44:21.847Z · LW(p) · GW(p)
I wrote "the goal of one crisp answer seems aimed at a childish mentality -- except in the (rarer than we may think) cases when there really is one crisp answer."
which chaosmage unfortunately truncated.
Anyway, more often than not, I at any rate, want food for thought, not "the answer".
An earlier e.g. from my initial article: <> to which I'd want to add articletype:diy, an expert might indeed provide one crisp answer, somewhat like the "feeling lucky" answer from Google, but I'd feel I was missing something. I'd like to see every relevant answer with some attempt at ranking them. I might prefer a youtube, or I might prefer text, diagrams, and the occasional photo. I might watch a youtube then print out the text version to take out to the garage. Some articles might reference tools that I never heard of, and doubt I could lay my hands on, while others provided ways to do it without those tools. I might watch the youtube with the special tool and the youtube without, and conclude the latter takes more dexterity than I have and that I'd better find a way to borrow that tool.
To the extent people want just an answer, or worse, "the answer" rather than "food for thought" it may be at least in part due to the cultural environment saying "We have the answer!" "Answers at 6:00", and so on. Then again, in some situations I'd just want to know the answer, or the most popular match.
comment by [deleted] · 2015-05-24T05:10:18.063Z · LW(p) · GW(p)
sd