Tools for finding information on the internet
post by RomanHauksson (r) · 2023-02-09T17:05:28.770Z · LW · GW · 11 commentsThis is a link post for https://roman.computer/finding_information/
Contents
Search engines Bypassing restrictions Trustworthy sources None 11 comments
Edit 2023-05-09: I recorded a presentation for EA Software Engineers about this post. In it, I demonstrate each of the tools and discuss some extra ones at the end, namely content blockers, userscripts, and alternative front-end websites.
Isn't the internet such a magically useful tool? Thirty years ago, if you wanted to know how many plays Shakespeare wrote, you would have to physically walk to your local library and find a relevant book. Now, you can find the answer in less than ten seconds, at any time, wherever you are.
However, the internet is not a truthful, superintelligent oracle. Rather, it's a dangerous jungle of knowledge you must learn to navigate if you wish to find the truth. Good information is censored, hidden behind paywalls or within piles of spam, and difficult to differentiate from untrustworthy information. This post won't be a complete guide on how to navigate the world wide web of knowledge, but it will give you some tools I've discovered over the years that you can throw in your digital rucksack to aid your journey.
Search engines
- The great internet sage Gwern Branwen wrote an advanced guide on finding references, papers, and books online.
- The search engines Brave Search and Kagi have the features "Goggles" and "Lenses" respectively, which are presets that filter or re-rank entire categories of websites in your results.
- SearXNG is a highly customizable internet metasearch engine.
- Perplexity uses natural language processing to answer your query with a paragraph (with sources) and allows you to ask followup questions.
- Metaphor allows you to find websites by writing creative and long-form prompts, also using NLP.
- Elicit is a research assistant that helps you find relevant research papers, also using NLP.
Bypassing restrictions
Sometimes you know exactly where to find a piece of information, but it's locked behind a paywall or deleted from the internet.
- Unddit displays deleted comments and posts on Reddit.
- Internet Archive is a non-profit library of free books, movies, websites, et cetera. It's famous for the Wayback Machine, which displays past archived snapshots of a given URL.
- Bypass Paywalls is a browser extension to help bypass paywalls on selected sites.
- The subreddit r/piracy has a wiki with loads of resources on obtaining copyrighted material for free.
- Anna's Archive is a shadow library metasearch engine that aggregates results from websites that host copyrighted books, academic papers, magazines, et cetera.
Trustworthy sources
It is particularly frustrating to find trustworthy knowledge about certain topics because of misaligned incentives: researching which product to buy or which supplements actually work is hard because everyone's trying to sell you something.
- Consumer Reports independently tests consumer products and gives in-depth recommendations. It does not rely on affiliate commissions.
- Examine is a database of research about nutrition and supplements that has no industry ties, sponsorships, or ads.
11 comments
Comments sorted by top scores.
comment by the gears to ascension (lahwran) · 2023-01-30T19:28:16.741Z · LW(p) · GW(p)
Great suggestions. I'd also add
for academics, https://arxivxplorer.com/ is a very solid semantic search engine for arxiv, beats semanticscholar for search; different strengths than metaphor and sometimes still fails to find stuff I know exists.
for academics, https://www.semanticscholar.org/ has a paper recommender, which you use by adding papers to folders and marking them as "give me a feed, please"; it does a great job finding stuff on topics I'm interested in as it comes out. Note that semanticscholar's search leaves much to be desired, their model is best used in recommender mode.
for academics, https://my.paperscape.org/ is mostly not a search engine, but is convenient for mapping out which papers reference which other papers more quickly, and goes together nicely with other items on this list.
---
For general users, https://consensus.app/ is a convenient but limited question-answer search for papers. not as strong as elicit in some ways, and I'd rather check examine, but consensus is a nice stopgap for the great many things examine hasn't had time to investigate.
For general users, https://phind.com/ and https://neeva.com/ and https://komo.ai/ are some other ai search engines, in approximately descending strength. None of these compare to metaphor in strength, imo.
---
I shared a bunch of stuff like these in https://www.lesswrong.com/posts/ozojWweCsa3o32RLZ/list-of-links-formal-methods-embedded-agency-3d-world-models [LW · GW]
comment by ChristianKl · 2023-01-29T14:49:05.670Z · LW(p) · GW(p)
Wirecutter's business model is affiliate commissions, it has misaligned incentives in a way that Consumer Reports and Examine which make money with subscriptions don't.
Replies from: r↑ comment by RomanHauksson (r) · 2023-01-29T17:54:20.922Z · LW(p) · GW(p)
I was not aware of this. Just edited Wirecutter out, thanks.
comment by niplav · 2023-01-30T12:32:18.919Z · LW(p) · GW(p)
I have found Connected Papers to be quite useful, but they're limited to 3 free queries a month.
comment by lovetheusers (CrazyPyth) · 2023-01-31T02:45:03.852Z · LW(p) · GW(p)
Good AI summarization tools: https://www.towords.io/ https://detangle.ai/
Replies from: lahwran↑ comment by the gears to ascension (lahwran) · 2023-02-09T06:31:54.736Z · LW(p) · GW(p)
hey might wanna remove towords, looks like we missed a step in checking it - it's really expensive. detangle also costs money but a more manageable amount; that said I've lost my feeling of being impressed by detangle.
comment by MondSemmel · 2023-05-09T16:50:28.925Z · LW(p) · GW(p)
(Note that this post was already posted a couple months ago. The submission date is set to today only because it was accidentally moved to drafts at some point.)
comment by trevor (TrevorWiesinger) · 2023-01-29T14:40:42.891Z · LW(p) · GW(p)
I found this post extremely helpful.
However, there's two things I'm suspicious about. The first was Brave Search and Kagi; facebook bends over backwards to avoid giving users that kind of control, and this is a norm I've observed throughout web services run by large corporations. Kagi and Brave Search don't seem to have ties to a large corporation, but I'm still suspicious that something that good could exist in the modern era. I'd love to be proven wrong about this.
The other thing is NYT wirecutter. I remember some pretty suspicious content there, although I never wrote anything down. It was never anywhere near as serious as that one time WSJ wrote an outright propaganda piece praising Roblox for hijacking the minds of millions of 10-year-olds. Maybe NYT innocently followed some trends that were actually started by corrupted reviewers, and that set off my warning bells because I only ever observed wirecutter.
Replies from: lahwran↑ comment by the gears to ascension (lahwran) · 2023-01-29T23:57:23.279Z · LW(p) · GW(p)
kagi is a paid service. I don't really understand brave's model.
comment by strikingLoo (luciano-strika-1) · 2023-01-31T02:10:32.994Z · LW(p) · GW(p)
Always happy to see new blogs joining the net, I hope we'll see more posts!
I love Elicit and have used it for random insight porn before. I'd love a post where someone uses these sorts of tools for insights and shows their process step-by-step.
comment by Shmi (shminux) · 2023-01-29T22:15:18.268Z · LW(p) · GW(p)
I'd add that trustworthy news sources are hard to come by. All US media is politicized between antimaskers and antimuskers. All British media is either very left or belongs to News Corp. Or of tabloid quality. Even formerly impartial outlets like Reuters are now leaning one way or the other. AP is still holding on, in most areas. Aljazeera is pretty decent for anything unrelated to Israel.
Sadly, unbiased coverage of the Russia/Ukraine war is virtually non-existent.