Should I fundraise for open source search engine?

post by samuelshadrach (xpostah) · 2025-03-23T13:04:16.149Z · LW · GW · No comments

This is a question post.

Contents

  Answers
    1 Robert Cousineau
None
No comments

IMO it is possible to run an open source search engine on a consumer PC with higher search accuracy than Google.[1]

Benefits of open source search engine:[2]

(There are more benefits, I haven't yet figured out the best sales pitch for this but I could come up with something if given time.)

 

I will need around $1M in funding to build this, and since I would prefer to open source it, it will most likely have to be non-profit funding.

I am open to putting in some effort to increase awareness, network more, if I believed I would get funding. But I'm currently not that optimistic. Hence I wanted a second opinion.

 

Prediction

 

  1. ^

    The trick is to convert text into embeddings and embeddings into locality-sensitive hashes. 1000 bytes plaintext -> 3 byte hash, so commoncrawl 2 PB -> 6 TB index. <100 ms latency search possible on consumer HDDs. Can provide more technical details if anyone wants.

  2. ^

    Obligatory comment for LW. This project has zero benefit if short timelines (ASI by 2030), I am not betting on short timelines though.

Answers

answer by Robert Cousineau · 2025-04-10T21:49:28.961Z · LW(p) · GW(p)

Kagi seems to fully satisfy "provides a competitor to Big Tech" as well as any non-big tech competitor can be expected to (actively and consistently growing, good product, etc).

I do not believe they are open source, but they certainly seem less censorious.  

I would not personally consider this a reasonable use of money or time.  

comment by samuelshadrach (xpostah) · 2025-04-11T08:09:19.847Z · LW(p) · GW(p)

Open source is a requirement for me, as I want to:

  • search datasets that a big company would legally not be allowed to be search, such as documents leaked by whistleblowers
  • search on an airgapped machine - so the whole world doesn't get to know what a team of political dissidents is searching for, for example

I would not personally consider this a reasonable use of money or time.

Fair

No comments

Comments sorted by top scores.