Should I fundraise for open source search engine?
post by samuelshadrach (xpostah) · 2025-03-23T13:04:16.149Z · LW · GW · No commentsThis is a question post.
Contents
Answers 1 Robert Cousineau None No comments
IMO it is possible to run an open source search engine on a consumer PC with higher search accuracy than Google.[1]
Benefits of open source search engine:[2]
- provides a competitor to Big Tech, so their governance decisions on the internet will be less extractive. Most tech companies are ultimately doing some sort of search, be it for partners or jobs or restaurants or whatnot.
- can build applications that are unlikely to be built by Big Tech. For example new forms of government can built if you have open source search and uncensorable data as primitives
(There are more benefits, I haven't yet figured out the best sales pitch for this but I could come up with something if given time.)
I will need around $1M in funding to build this, and since I would prefer to open source it, it will most likely have to be non-profit funding.
I am open to putting in some effort to increase awareness, network more, if I believed I would get funding. But I'm currently not that optimistic. Hence I wanted a second opinion.
- ^
The trick is to convert text into embeddings and embeddings into locality-sensitive hashes. 1000 bytes plaintext -> 3 byte hash, so commoncrawl 2 PB -> 6 TB index. <100 ms latency search possible on consumer HDDs. Can provide more technical details if anyone wants.
- ^
Obligatory comment for LW. This project has zero benefit if short timelines (ASI by 2030), I am not betting on short timelines though.
Answers
Kagi seems to fully satisfy "provides a competitor to Big Tech" as well as any non-big tech competitor can be expected to (actively and consistently growing, good product, etc).
I do not believe they are open source, but they certainly seem less censorious.
I would not personally consider this a reasonable use of money or time.
↑ comment by samuelshadrach (xpostah) · 2025-04-11T08:09:19.847Z · LW(p) · GW(p)
Open source is a requirement for me, as I want to:
- search datasets that a big company would legally not be allowed to be search, such as documents leaked by whistleblowers
- search on an airgapped machine - so the whole world doesn't get to know what a team of political dissidents is searching for, for example
I would not personally consider this a reasonable use of money or time.
Fair
No comments
Comments sorted by top scores.