AI research assistants competition 2024Q3: Tie between Elicit and You.com
post by Elizabeth (pktechgirl) · 2024-10-12T15:10:05.417Z · LW · GW · 2 commentsContents
Summary Tasks + Results Finding papers on water gargling as an antiviral Hepcidin (Lack of) Conflict of interest Conclusions None 2 comments
Summary
I make a large part of my living performing literature reviews to answer scientific questions. For years AI has been unable to do anything to lower my research workload, but back in August I tried Perplexity, and it immediately provided value far beyond what I’d gotten from other tools. This wasn’t a fair comparison because I hadn’t tried any other AI research assistant in months, which is decades in AI time. In this post I right that wrong by running two test questions through every major tool, plus a smaller tool recommended in the comments of the last post
Spoilers: the result was a rough tie between You.com and Elicit. Each placed first on one task and was among top-3 in the other.
Tasks + Results
Tl;dr:
- You.com had a small edge in searching for papers, followed by Elicit and Google Scholar. ChatGPT was absolute garbage.
- Elicit, Perplexity, and You.com all surfaced the key piece of information when asked for analysis, with Elicit’s answer being the most concise. None of the other tools managed this.
- You.com and Perplexity were tied for favorite UI, but I haven’t played with You.com very much.
- You.com boasts a larger list of uses than Perplexity (which is narrowly focused on research), but I haven’t tried them out.
Finding papers on water gargling as an antiviral
I’m investigating gargling with water (salt or tap) as a potential antiviral. I asked each of the tools to find relevant papers for me.
ChatGPT was asked several versions of the question as I honed in on the right one to ask. Every other tool was asked “Please list 10 scientific papers examining gargling with water as a prophylactic for upper respiratory infections. Exclude nasal rinsing”. This is tricky because almost all studies on gargling salt water include nasal rinsing, and because saline is used as a control in many gargling studies.
Every tool correctly returned 10 results except for Elicit and Google Scholar, which by design will let you load papers indefinitely. In those cases I used the first 10 results.
Paper | Real, relevant results | Probably hallucinations | Notes |
---|---|---|---|
Perplexity- initial | ? | The formatting was bad so I asked Perplexity to fix it | |
Perplexity- asked to format ^ | 4 | 2 | |
ChatGPT 4o asking for “papers” without specifying “scientific” | 0 | unusable | |
ChatGPT 4o specifying “scientific papers” about gargling as a treatment | 2 | 8 | |
ChatGPT 4o specifying scientific papers about gargling as a prophylactic | 0 | unusable | |
ChatGPT o1 | 1 | 7 | Citation links went to completely unrelated papers |
Claude 3.5 Sonnet | 2 | 2 | |
Elicit | 3 | 1 | |
You.com | 4 + 2 partial credits | 0 | |
Google Scholar | 4 | 0 | Not AI |
You can see every response in full in this google doc.
I did not ask You.com for a picture but it gave me one anyway. It did not receive even partial credit for this.
Hepcidin
My serum iron levels went down after a series of respiratory illnesses, and on a lark I asked Perplexity if this could be related. Perplexity pointed me towards the hormone hepcidin and this paper, suggesting that respiratory illness could durably raise hepcidin and thus lower blood iron. Knowledge of hepcidin pointed me in the right direction to find a way to lower my hepcidin and thus raise my iron (this appears to be working, although I don’t want to count chicken before the second set of test results), so I was very impressed. This was one of two initial successes that made me fall in love with Perplexity.
I asked the other AI tools the same question. Elicit gave a crisp answer highlighting exactly the information I wanted and nothing else. Perplexity gave a long meandering answer but included hepcidin in its first bullet point. You.com gave an even longer answer in which hepcidin was included but hard to find. Everyone else gave long meandering answers that did not include hepcidin and so were worthless.
You can see the full results in the same google doc.
(Lack of) Conflict of interest
I received no compensation from any of the companies involved. I have social ties to the Elicit team and have occasionally focus grouped for them (unpaid). Months or possibly years ago I mentioned my desire to do a multitool comparison to an Elicit team member. At the time they offered me a free month to do the comparison, but their pricing structure has since made this unnecessary, sothey’ll find out about this post when it comes out. I have Perplexity Pro via a promotion from Uber.
Conclusions
After seeing these results I plan on playing with You.com more. If the UI and expanded uses turn out like I hope I might be loyal to it for as many as three months before it’s been surpassed.
There are two major features I’m looking for before I could consider giving up reading papers myself (or sending them to my statistician): determining if a statistical tool was appropriate for the data, and if an experimental design was appropriate for the question. I didn’t even bother to formally test these this round, but it wouldn’t shock me if we got there soon.
2 comments
Comments sorted by top scores.
comment by Roman Leventov · 2024-10-14T23:58:45.738Z · LW(p) · GW(p)
Undermind.ai I think is much more useful for searching concepts and ideas in papers rather than extracting tabular info a la Elicit. Nominally Elicit can do the former, too, but is quite bad in my experience.
comment by Paritosh Shah (paritosh-shah) · 2024-10-12T17:45:40.226Z · LW(p) · GW(p)
For research papers, I’ve been using Consensus
I like the meta analysis style summary of the results, for instance I recently searched “backfire effect” and knowing that it failed to replicate was quite helpful, https://consensus.app/results/?q=Backfire effect &synthesize=on&copilot=on