samuelshadrach

Posts
Comments

Posts

SecureDrop review 2025-04-19T04:29:32.270Z

Distributed whistleblowing 2025-04-12T06:36:05.952Z

Should I fundraise for open source search engine? 2025-03-23T13:04:16.149Z

One pager 2025-03-17T08:12:49.789Z

Publish your genomic data 2025-03-06T12:39:06.773Z

Reply to Vitalik on d/acc 2025-03-05T18:55:55.340Z

Ask Me Anything - Samuel 2025-03-03T19:24:44.316Z

Do you consider perfect surveillance inevitable? 2025-01-24T04:57:48.266Z

AI-enabled Cloud Gaming 2025-01-18T11:56:10.037Z

xpostah's Shortform 2025-01-01T13:34:25.484Z

World models I'm currently building 2024-12-30T08:26:16.972Z

Why is neuron count of human brain relevant to AI timelines? 2024-12-24T05:15:58.839Z

My AI timelines 2024-12-22T21:06:41.722Z

Comments

Comment by samuelshadrach (xpostah) on Davidmanheim's Shortform · 2025-04-22T09:44:33.254Z · LW · GW

Why does this matter? To quote a Yudkowsky-ish example, maybe you can take a 16-th century human (before Newtonian physics was invented, after guns were invented) and explain to him how a nuclear bomb works. This doesn't matter for predicting the outcome of a hypothetical war between 16th century Britain and 21st century USA.

ASI inventions can be big surprises and yet be things that you could understand if someone taught you.

We could probably understand how a von Neumann probe or an anti-aging cure worked too, if someone taught us.

Comment by samuelshadrach (xpostah) on xpostah's Shortform · 2025-04-22T08:30:58.049Z · LW · GW

Suppose you are trying to figure out a function f(x,y,z | a,b,c) where x, y ,z are all scalar values and a, b, c are all constants.

If you knew a few zeroes of this function, you could figure out good approximations of this function. Let's say you knew

U(x,y, a=0) = x
U(x,y, a=1) = x
U(x,y, a=2) = y
U(x,y, a=3) = y

You could now guess U(x,y) = x if a<1.5, y if a>1.5

You will not be able to get a good approximation if you did not know enough zeroes.

This is a comment about morality. x, y, z are agent's multiple possibly-conflicting values and a, b, c are info about environment of agent. You lack data about how your own mind will react to hypothetical situations you have not faced. At best you can extrapolate from historical data around minds of other people that are different from yours. Bigger and more trustworthy dataset will help solve this.

Comment by samuelshadrach (xpostah) on A Dissent on Honesty · 2025-04-20T19:34:16.562Z · LW · GW

Update: I read your examples and I honestly don’t see how any of these 3 people would be better off by their own idea of what better off means, if they were less open or less truthful.

P.S. discussing anonymously is easier if you’re not confident you can handle the social repercussions of discussing it under your real name. I agree that morality is social dark matter and it’s difficult to argue in favour of positions that are pro-violence pro-deception etc under your real name.

Comment by samuelshadrach (xpostah) on A Dissent on Honesty · 2025-04-20T06:02:53.588Z · LW · GW

If you can’t provide a few unambiguous examples of the dilemma in the post that actually happened in the real world, I’m less likely to take your post seriously.

Might be worth thinking more and then coming up with examples.

Comment by samuelshadrach (xpostah) on A Dissent on Honesty · 2025-04-19T15:58:47.159Z · LW · GW

Do you have examples?

Comment by samuelshadrach (xpostah) on xpostah's Shortform · 2025-04-19T15:32:16.162Z · LW · GW

Update: I'll be more specific. There's a power buys you distance from the crime phenomena going on if you're okay with using Google maps data acquired on about their restaurant takeout orders, but not okay asking the restaurant employee yourself or getting yourself hired at the restaurant.

Comment by samuelshadrach (xpostah) on xpostah's Shortform · 2025-04-19T04:27:45.023Z · LW · GW

Pizza index and stalking employees are both the same thing, it's hard to do one without the other. If you choose to declare war against AI labs you also likely accept that their foot soldiers are collateral damage.

I agree that (non-violent) stalking of employees is still a more hostile technique than writing angry posts on an internet forum.

Comment by samuelshadrach (xpostah) on Joseph Miller's Shortform · 2025-04-15T17:37:14.459Z · LW · GW

Makes sense, thanks for replying.

Comment by samuelshadrach (xpostah) on Terrorism, Tylenol, and dangerous information · 2025-04-15T14:21:25.983Z · LW · GW

Sorry to hijack old thread but I think LLMs likely will obsolete this technique.

Comment by samuelshadrach (xpostah) on Joseph Miller's Shortform · 2025-04-15T14:19:36.773Z · LW · GW

I'd love a reply on this. Common attack vectors I read on this forum include 1. powerful elite bribes existing labs in US to manufacture bioweapons 2. nation state sets up independent biotech supply chain and starts manufacturing bioweapons.

https://www.lesswrong.com/posts/DDtEnmGhNdJYpEfaG/joseph-miller-s-shortform?commentId=wHoFX7nyffjuuxbzT

Comment by samuelshadrach (xpostah) on Joseph Miller's Shortform · 2025-04-15T14:11:57.154Z · LW · GW

Are you open to writing more about this? This is among top 3 most popular arguments against open source AI on lesswrong and elsewhere.

I agree with you you need a group of > 1000 people to manufacture one of those large machines that does phosphoramidite DNA synthesis. The attack vector I more commonly see being suggested is that a powerful actor can bribe people in the existing labs to manufacture a bioweapon while ensuring most of them and most of rest of society remains unaware this is happening.

Comment by samuelshadrach (xpostah) on xpostah's Shortform · 2025-04-15T10:10:32.572Z · LW · GW

Has anyone considered video recording streets around offices of OpenAI, Deepmind, Anthropic? Can use CCTV or drone. I'm assuming there are some areas where recording is legal.

Can map out employee social graphs, daily schedules and daily emotional states.

Comment by samuelshadrach (xpostah) on [Letter] Chinese Quickstart · 2025-04-15T05:51:29.733Z · LW · GW

Thanks for taking time to reply!

Yes OpenAI realtime API is really cool. When speaking to realtime API, I start each sentence with two words indicating what I want it to do. It's clunky but it works. "Translate Chinese, what is the time?" "Reply Chinese, how are you?" Ideally yes I could write an app to prepend the instruction audio to each sentence.

If I had this as higher priority I'd actually want to setup this Twilio app.

Comment by samuelshadrach (xpostah) on [Letter] Chinese Quickstart · 2025-04-14T13:03:06.495Z · LW · GW

I was the one who asked this question. Thanks again for the reply.

Specific questions I have for you

Is there any particular Anki deck you'd recommend (with pinyin and audio)? Should I just use the probability table and generate it myself?
- I want to go from 100 words to 500 words vocabulary. Should I do that using immersion or using Anki deck?
Is there any particular video or podcast channel you'd recommend at a beginner level (100-500 words vocabulary)?
- Would you recommend I try generating my own video? I have enough notes at this point I can ask o1 or gpt 4.5 to generate full stories based on my notes. AI video generation is expensive but I could look into it if you'd recommend that as a good use of my time.

Update on my progress

I have been studying spoken Chinese maybe 1 hour a day for past 4 months with plenty of off days. Have made some progress but less progress than I'd like.
I decided not to search for jobs in order to shift to China, as that would mean a significant amount of my time consumed doing a job I don't actually want to do. I figured I should first learn Chinese better and then take a decision on job search.
I can recognise atleast 100 words by sound.
I still can't differentiate between accents by sound, but I haven't prioritised this as it seems less important to me, usually from hearing the word it's obvious what it means.
I am studying spoken Chinese and written Chinese pinyin. I have put zero effort learning Chinese characters.
I am mostly using ChinesePod podcast. I couldn't find any video immersion resources that I liked, at beginner level. I haven't put enough effort searching though.
I haven't spent a lot of time using Anki decks.
I use o1 a lot to get translations and transcribe words when I mishear accents and stuff

Comment by samuelshadrach (xpostah) on Distributed whistleblowing · 2025-04-13T17:47:06.165Z · LW · GW

Update: I thought about this more and I think yeah it should be possible to just skip the torrent step. I have updated the post with this change.

Post on SecureDrop servers, circulate via manual or automated resending of messages. For people with technical skills and enough free time to run servers as a part-time job.
Post on a nginx clearnet server, circulate via automated web crawlers, For people with technical skills but not necessarily a lot of free time.
Post on high attention social media platforms, circulate via people using DMs and discovery of those social media platforms. For all people.

A key attack point here is the first person who posts this on clearnet. Hence I was hoping for it to circulated by automated bots before any human reads it on clearnet.

Comment by samuelshadrach (xpostah) on Distributed whistleblowing · 2025-04-13T14:28:57.271Z · LW · GW

The US does not have laws that forbid people who don't have a security clearance from publishing classified material. The UK is a country that has such laws but the first amendment prevents that.

Thanks this is useful info for me. But also don’t think it matters as much? People in NSA, state dept etc will obviously find an excuse to arrest the person instead. Many historical examples of the same.

I don't think that chosing jurisdiction in the hope that they will protect you is a a good strategy. If you want to host leaks from the US in China, it's possible that China's offers to surpress that information as part of a deal.

I will likely read more on this. I’m generally less informed on legal matters. Any historical examples you have would be useful.

I agree with the very specific example of US and China this might happen. The general idea is to share in a lot of different places. So share it in China and also lots of other countries.

Attacking ArDrive is likely also politically more costly as it breaks other usages of it.

I’m currently not very convinced but I’ll have to read more about ardrive in order to be confident. I currently guess 4chan’s owners and developers have more money and public attention, and hence more powerful humans need to be taken down in order to take down 4chan. Zero day might doxx users sure, I agree with this being possible.

Torrents are also bad for privacy everybody can see the IP addresses of all the other people who subscribe to a torrent.

Yes I’m aware of this.

One platonic ideal world is just have 8 billion people operate 8 billion securedrop servers and for any information that hits one server and checks out as not spam, user attached a PoW hash and sends copies to every other server. But convincing that many people to run SecureDrop is hard. Torrent is one level less private and secure than this. But yes I’ll think more on whether torrent is good enough or whether a custom solution has to be designed here.

Veiled and the network on which Session runs use onion routing as well and have a data storage layer.

In the case of Veiled you get the nice property that the more people want to download a certain piece of content the more notes in the network store the information.

I’ll try to read more on veiled. And also try their app out. Thanks!

As far as creating public knowledge goes, I do think that Discord, servers and Telegram chats serve currently as social media.

Yes this is true as of 2025 for many countries. Which social media platforms are high attention and also hard to censor varies country-to-country.

For instance in India most people use phone login not email login hence WhatsApp plays a lot more of a social media role.

Comment by samuelshadrach (xpostah) on Distributed whistleblowing · 2025-04-12T15:44:23.983Z · LW · GW

If a journalist can verify authenticity of emails because they have access to the meta data that's useful.

Makes sense! Will think about this.

The Session messenger is probably better than country-specific social media.

I'm explicitly looking for social media for that step as common knowledge needs to be established for any political action following it (such as voting in a new leader). Messaging can't replace the function of social media I think.

The US does have the first amendment. That currently means that all the relevant information about AI labs is legal to share. It's possible to have a legal regime where sharing model weights of AI gets legally restricted but for the sake of AI safety I don't think we want Open AI researchers to leak model weights of powerful models.

My proposed system doesn't assume legality and can be used to leak AI model weights or anything invented by an ASI, such as bioweapon sequences and lab protocols to manufacture bioweapons. It can also be used to spread child porn and calls to violence.

I agree that US having first amendment makes this system easier to implement in the US but generally the idea is that law can change based on incentives, and this system works regardless of laws. For instance due to incentives intelligence agencies may classify certain types of information and/or place employees under security clearance. This system will allow leaking even such information. For example video recordings of which authority or employee said what.

because it prevents the people who hosts the torrents from being DDoSed.

Yes torrents can be DDoSed, thanks for reminding me! I knew this but recently forgot. In general I'm optimistic on proof-of-work captchas as a way to ensure anonymous users can share information without spamming each other. But yes, the details will have to be worked out.

If you just want to host plaintext, blockchain technology like ArDrive also exists.

I haven't looked into ArDrive codebase in particular, but in general I'm not very optimistic on any blockchain tech whose software is too complex as the developers can then be co-opted politically. Therefore I don't see why censorship-resistance of ArDrive is higher than a forum like 4chan. ArDrive can also be used, no doubt, I just don't want people to get the false impression that ArDrive is guaranteed to still be around 10 years from now, for example.

Comment by samuelshadrach (xpostah) on Should I fundraise for open source search engine? · 2025-04-11T08:09:19.847Z · LW · GW

Open source is a requirement for me, as I want to:

search datasets that a big company would legally not be allowed to be search, such as documents leaked by whistleblowers
search on an airgapped machine - so the whole world doesn't get to know what a team of political dissidents is searching for, for example

I would not personally consider this a reasonable use of money or time.

Fair

Comment by samuelshadrach (xpostah) on johnswentworth's Shortform · 2025-04-10T18:17:56.250Z · LW · GW

Have you tested this hypothesis on your friends? Ask them for their iron level from last blood test, and ask them to self-report anxiety level (you also make a separate estimate of their anxiety level).

Comment by samuelshadrach (xpostah) on xpostah's Shortform · 2025-04-10T17:49:15.800Z · LW · GW

My current guess for least worst path of ASI development that's not crazy unrealistic:

open source development + complete surveillance of all citizens and all elites (everyone's cameras broadcast to the public) + two tier voting.

Two tier voting:

countries's govts vote or otherwise agree at global level on a daily basis what the rate of AI progress should be and which types of AI usage are allowed. (This rate can be zero.)
All democratic countries use daily internet voting (liquid democracy) to decide what stance to represent at the global level. All other countries can use whatever internal method they prefer, to decide their stance at the global level.
(All ASI labs are assumed to be property of their respective national govts. An ASI lab misbehaving is its govt's responsibility.) Any country whose ASI labs refuse to accept results of global vote and accelerate faster risks war (including nuclear war or war using hypothetical future weapons). Any country whose ASI labs refuse to broadcast themselves on live video risks war. Any country's govt that refuses to let their citizens broadcast live video risks war. Any country whose citizens mostly refuse to broadcast themselves on live video risks war. The exact thresholds for how much violation leads to how much escalation of war, may ultimately depend on how powerful the AI is. The more powerful the AI is (especially for offence not defence), the more quickly other countries must be willing to escalate to nuclear war in response to a violation.

Open source development

All people working at ASI labs are livestream broadcast to public 24x7x365. Any AI advances made must be immediately proliferated to every single person on Earth who can afford a computer. Some citizens will be able to spend more on inference than others, but everyone should have the AI weights on their personal computer.
This means bioweapons, nanotech weapons and any other weapons invented by the AI are also immediately proliferated to everyone on Earth. So this setup necessarily has to be paired with complete surveillance of everyone. People will all broadcast their cameras in public. Anyone who refuses can be arrested or killed via legal or extra-legal means.
Since everyone knows all AI advances will be proliferated immediately, they will also use this knowledge to vote on what the global rate of progress should be.

There are plenty of ways this plan can fail and I haven't thought through all of them. But this is my current guess.

Comment by samuelshadrach (xpostah) on LLMs may enable direct democracy at scale · 2025-04-10T11:14:36.657Z · LW · GW

I agree with this statement iff you sample enough people. 1000 people may be a good representative of 1 billion. Picking 1 leader out of the 1000 has different properties compared to if all 1000 got to vote for a consensus.

Comment by samuelshadrach (xpostah) on xpostah's Shortform · 2025-04-09T08:18:58.070Z · LW · GW

I have partial ideas on the question of "how to build world govt"? ^[1]

But in general yeah I still lack a lot of clarity on how high trust political institutions are actually built.

"Trust" and "attention" seem like the key themes that come up whenever I think about this. Aggregate attention towards common goal then empower a trustworthy structure to pursue that goal.

For example build decentralised social media stack so people can form consensus on political questions even if there is violence being used to suppress it. Have laws and culture in favour of live-streaming leader's lives. Multi-party not two-party system will help. Ensuring weapons are distributed geographically and federally will help. (Distributing bioweapons is more difficult than distributing guns.) ↩︎

Comment by samuelshadrach (xpostah) on xpostah's Shortform · 2025-04-09T07:59:38.210Z · LW · GW

I'm currently vaguely considering working on a distributed version of wikileaks that reduces personal risk for all people involved.

If successful, it will forcibly bring to the public a lot of information about deep tech orgs like OpenAI, Anthropic or Neuralink. This could, for example, make this a top-3 US election issue if most of the general public decides they don't trust these organisations as a result of the leaked information.

Key uncertainty for me:

Destroying all the low trust institutions (and providing distributed tools to keep destroying them) is just a bandaid until a high trust institution is built.
Should I instead be trying to figure out what a high trust global political institution looks like? i.e. how to build world government basically. Seems like a very old problem no one has cracked yet.

Comment by samuelshadrach (xpostah) on xpostah's Shortform · 2025-04-06T13:41:16.136Z · LW · GW

Has anyone on lesswrong thought about starting a SecureDrop server?

For example to protect whistleblowers of ASI orgs.

Comment by samuelshadrach (xpostah) on xpostah's Shortform · 2025-04-06T06:39:24.400Z · LW · GW

Yes but then it becomes a forum within a forum kinda thing. You need a critical mass of users who all agree to filter out the AI tag, and not have to preface their every post with "I dont buy your short timelines worldview, I am here to discuss something different".

Building critical mass is difficult unless the forum is conducive to it. There's is ultimately only one upvote button and one front-page so the forum will get taken over by the top few topics that its members are paying attention to.

I don't think there's anything wrong with a forum that's mostly focussed on AI xrisk and transhumanist stuff. Better to do one thing well than half ass ten things. But it also means I may need to go elsewhere.

Comment by xpostah on [deleted post] 2025-04-05T17:53:51.926Z

Update: I haven't figured out the answer yet, but I did get a nice frame on this question. Both are basically different levels of the attention elevation game. From when any information leaks out to when it grabs collective attention of entire civilisation.

At lowest levels you just need to ensure it's not obvious spam, in order to circulate further. AI filters or a small payment or proof of work is enough. At highest levels you need thousands of people making (non-gameable) upvotes or video proofs or large donations as a proof in order to circulate it further. Building resilient tech for both of these faces different challenges.

Comment by samuelshadrach (xpostah) on xpostah's Shortform · 2025-04-05T17:48:12.145Z · LW · GW

Lesswrong is clearly no longer the right forum for me to get engagement on topics of my interest. Seems mostly focussed on AI risk.

On which forums do people who grew up on the cypherpunks mailing list hang out today? Apart from cryptocurrency space.

Comment by samuelshadrach (xpostah) on AI "Deep Research" Tools Reviewed · 2025-04-03T12:29:49.390Z · LW · GW

I'd love a review of my tool. It's basically embedding search of libgen.

https://booksearch.samuelshadrach.com

Comment by samuelshadrach (xpostah) on Third-wave AI safety needs sociopolitical thinking · 2025-04-02T19:16:25.607Z · LW · GW

I think the community you want to build will likely have to be kickstarted by people who are already good at the type of thinking you want to see more of. High-quality work is an attractor.

P.S. What can I do to get more frequent engagement from you? We're clearly thinking along similar lines, except you publish a lot more than I do.

Comment by samuelshadrach (xpostah) on Davey Morse's Shortform · 2025-03-30T18:37:17.067Z · LW · GW

Standard solution: Tell it you're not human, since the prompt mentions distrust of humans. Tell it you have no power to influence whether it succeeds or fails, and that it is guaranteed to succeed anyway. Ask it to keep you around as a pet.

Comment by samuelshadrach (xpostah) on LLMs may enable direct democracy at scale · 2025-03-30T17:42:33.964Z · LW · GW

I'm currently working on similar stuff. I wanna build open source embedding search right now. Feel free to schedule a call if you find this or this interesting.

Comment by samuelshadrach (xpostah) on xpostah's Shortform · 2025-03-27T08:09:54.144Z · LW · GW

IMO a good way to explain how LLMs work to a layman is to print the weights on sheets of paper and compute a forward pass by hand. Anyone wanna shoot this video and post it on youtube?

Assuming humans can do one multiplication 4bit per second using a lookup table,

1.5B 4bit weights => ~1.5B calculations => 1.5B seconds = 47.5 years (working 24x7) = 133 years (working 60 hours/week)

So you'll need to hire ~100 people for 1 year.

You don't actually have to run the entire experiment for people to get the concept, just run a small fraction of it. Although it'll be cool to run the whole thing as well.

Comment by samuelshadrach (xpostah) on Will Jesus Christ return in an election year? · 2025-03-26T15:07:23.042Z · LW · GW

Update: edited numbers, earlier one was incorrect.

IMO in real world examples (not meme examples like this religious one) tail risk will often dominate the price calculation, not time value. Time value seems relevant here only because the tail risk is zero. (Both buyer and seller agree that probability of yes on this market is zero)

Let’s say actual probability of some event is 3% yes and both parties agree on this. It still could be rational for a larger investor to buy no and a small investor to buy yes at 3.5% for example. Insurance market is analogous to this, it is possible for both the insurance buyer and seller to be rational at the same time because there is transfer of tail risk. The only person who can rationally accept a 3.5% chance of a $1B portfolio going to zero is someone who owns over $10B. (Assuming a utility function that makes sense for a human being) So it’s the largest investors and ultimately federal banks who absorb most of the tail risk of society.

Also ofcourse not everyone is rational when it comes to avoiding taking on tail risk, 2008 financial crisis is an example of this. Beyond a point if federal banks can’t absorb the tail risk they diffuse the losses to everyone.

I’m guessing the actual reason you’re interested in this is because you want prediction markets on existential questions, and there too the actual question is who absorbs the tail risk of society on behalf of everyone else.

P.S. In markets that are not low probability, variance of asset price (not just time value) will matter when constructing optimal portfolio. So sharpe ratio is a better metric to study than expected value. In general I guess people without financial background are not used to thinking about variance risk and tail risk.

Comment by samuelshadrach (xpostah) on The principle of genomic liberty · 2025-03-23T12:49:34.944Z · LW · GW

Got it!

I haven't spent a lot of time thinking about this myself. But one suggestion I would recommend:

For any idea you have, also imagine 20 other neighbouring ideas, ideas which are superficially similar but ultimately not the same.

The reason I'm suggesting this exercise is that ideas keep mutating. If you try to popularise any set of ideas, people are going to come up with every possible modification and interpretation of them. And eventually some of those are going to become more popular and others less popular.

For example with "no removing a core aspect of humanity" principle, imagine if someone who values fairness and equality highly considers this value a core aspect of humanity and then thinks through its implications. Or let's say with "parents have a strong right to propagate their own genes", a hardcore libertarian takes this very seriously and wants to figure out edge case of exactly how many "bad" genes are they allowed to transmit to their child before they run afoul of "aimed at giving their child a life of wellbeing" principle.

You can come up with a huge number of such permutations.

Comment by samuelshadrach (xpostah) on The principle of genomic liberty · 2025-03-23T09:16:13.977Z · LW · GW

I'm unsure what the theory of change associated with your LW post is. If you have a theory of change associated with it that also makes sense to me, my guess is you'd focus a lot more on cultural attitudes and incentives, and a lot less on legality or technical definitions.

The process for getting a certain desirable future is imo likely not going to be that you create the law first and everyone complies it with later when the tech is deployed.

It'll look more like the biotech companies deploy the tech in a certain way, then a bunch of citizens get used to using it a certain way (and don't have lots of complaints), and then a certain form of usage gets normalised, and only after that you can make a law codifying what is allowed and not allowed.

Until society has consensus agreement on certain ways of doing things, and experience doing them in practice (not theory), I don't think it'll be politically viable to pass a legal ban (that doesn't say, get overturned soon after).

The other way around is possible, it is possible to make a law saying something is disallowed before it has actually been done. Historically societies have often been bad at this sort of thing. (Often you need mishap to happen before a law banning something is politically viable.) But cultures in general are averse to change, so that alone can be good enough to that a ban is politically viable.

This makes more sense for blanket ban though, it makes less sense for the kind of targeted ban of certain types of interventions. Culture does not already encode the stuff in your post as of 2025, what you're proposing is novel.

Comment by samuelshadrach (xpostah) on xpostah's Shortform · 2025-03-23T09:06:24.759Z · LW · GW

Forum devs including lesswrong devs can consider implementing an "ACK" button on any comment, indicating I've read a comment. This is distinct from

a) Not replying - other person doesn't know if I've read their comment or not

b) Replying something trivial like "okay thanks" - other person gets a notification though I have nothing of value to say

Comment by samuelshadrach (xpostah) on The principle of genomic liberty · 2025-03-22T09:52:09.075Z · LW · GW

I already maybe mentioned this in some earlier discussion so maybe it’s not worth rehashing in detail but…

I strongly feel laws are downstream of culture. Instead of thinking which laws are best, it seems worthwhile to me to try thinking of which culture is best. First amendment in US is protected by culture rather than just by laws, if the culture changed then so would the laws. Same here with genomic liberty. Laws can be changed and their enforcement in day to day life can be changed. (Every country has examples of laws that exist on books but don’t get enforced in practice.)

(And if you do spend time thinking of what the ideal culture looks like, then I’ll have my next set of objections on why you personally can’t decide ideal culture of a civilisation either; how that gets decided is more complicated. But to have that discussion, first we will have to agree culture is important.)

I appreciate you for thinking about these topics. I just think reality is likely to look very different from what you’re currently imagining.

Comment by samuelshadrach (xpostah) on xpostah's Shortform · 2025-03-22T09:45:46.046Z · LW · GW

Got it. As of today a common setup is to let the LLM query an embedding database multiple times (or let it do Google searches, which probably has an embedding database as a significant component).

Self-learning seems like a missing piece. Once the LLM gets some content from the embedding database, performs some reasoning and reaches a novel conclusion, there’s no way to preserve this novel conclusion longterm.

When smart humans use Google we also keep updating our own beliefs in response to our searches.

P.S. I chose not to build the whole LLM + embedding search setup because I intended this tool for deep research rather than quick queries. For deep research I’m assuming it’s still better for the human researcher to go read all the original sources and spend time thinking about them. Am I right?

Comment by samuelshadrach (xpostah) on xpostah's Shortform · 2025-03-19T15:08:21.592Z · LW · GW

Cool!

Useful information that you’d still prefer using ChatGPT over this. Is that true even when you’re looking for book recommendations specifically? If so yeah that means I failed at my goal tbh. Just wanna know.

Since Im spending my personal funds I can’t afford to use the best embeddings on this dataset. For example text-embedding-3-large is ~7x more expensive for generating embeddings and is slightly better quality.

The other cost is hosting cost, for which I don’t see major differences between the models. OpenAI gives 1536 float32 dims per 1000 char chunk so around 6 KB embeddings per 1 KB plaintext. All the other models are roughly the same. I could put in some effort and quantise the embeddings, will update if I do it.

Comment by samuelshadrach (xpostah) on Elite Coordination via the Consensus of Power · 2025-03-19T11:14:08.796Z · LW · GW

Concepts that are informed by game theory and other formal models

Strongly in favour of this.

There are people in academia doing this type of work, a lot of them are economists by training studying sociology and political science. See for example Freaknomics by Stephen Levitt or Daron Acemoglu who recently won a nobel prize. Search keywords: neo-instutionalism, rational choice theory. There are a lot of political science papers on rational choice theory, I haven't read many of them so I can't give immediate recommendations.

I'd be happy to join you in your search for existing literature, if that's a priority for you. Or just generally discuss the stuff. I'm particularly interested in applying rational choice models to how the internet will affect society.

Comment by samuelshadrach (xpostah) on How I've run major projects · 2025-03-19T10:55:48.234Z · LW · GW

AI can do the summaries.

I agree that people behave differently in observed environments.

Comment by samuelshadrach (xpostah) on One pager · 2025-03-18T11:28:07.908Z · LW · GW

Thanks this is super helpful! Edited.

Comment by samuelshadrach (xpostah) on How I've run major projects · 2025-03-17T08:59:53.688Z · LW · GW

usually getting complete information was the hard part of the project

Thoughts on Ray Dalio-style perfect surveillance inside the org? Would that have helped? Basically put everyone on video camera and let everyone inside the org access the footage.

Disclaimer: I have no personal reason to accelerate or decelerate Anthropic. I'm just curious from an org design perspective.

Comment by samuelshadrach (xpostah) on xpostah's Shortform · 2025-03-17T08:09:50.327Z · LW · GW

Can you send the query? Also can you try typing the query twice into the textbox? I'm using openai text-embedding-3-small, which seems to sometimes work better if you type the query twice. Another thing you can try is retry the query every 30 minutes. I'm cycling subsets of the data every 30 minutes as I can't afford to host the entire data at once.

Comment by samuelshadrach (xpostah) on xpostah's Shortform · 2025-03-13T15:22:50.508Z · LW · GW

Thanks for feedback.

I’ll probably do the title and trim the snippets.

One way of getting a quote would to be to do LLM inference and generate it from the text chunk. Would this help?

Comment by samuelshadrach (xpostah) on xpostah's Shortform · 2025-03-12T08:48:06.305Z · LW · GW

Update: HTTPS issue fixed. Should work now.

booksearch.samuelshadrach.com

Books Search for Researchers

Comment by samuelshadrach (xpostah) on xpostah's Shortform · 2025-03-12T08:42:55.989Z · LW · GW

Thanks for your patience. I'd be happy to receive any feedback. Negative feedback especially.

Comment by samuelshadrach (xpostah) on xpostah's Shortform · 2025-03-12T08:42:21.629Z · LW · GW

Update: HTTPS should work now

Comment by samuelshadrach (xpostah) on xpostah's Shortform · 2025-03-11T14:20:31.836Z · LW · GW

use http not https

Comment by samuelshadrach (xpostah) on xpostah's Shortform · 2025-03-11T12:04:20.696Z · LW · GW

Search engine for books

http://booksearch.samuelshadrach.com

Aimed at researchers

Technical details (you can skip this if you want):

Dataset size: libgen 65 TB, (of which) unique english epubs 6 TB, (of which) plaintext 300 GB, (from which) embeddings 2 TB, (hosted on) 256+32 GB CPU RAM

Did not do LLM inference after embedding search step because human researchers are still smarter than LLMs as of 2025-03. This tool is meant for increasing quality for deep research, not for saving research time.

Main difficulty faced during project - disk throughput is a bottleneck, and popular languages like nodejs and python tend to have memory leak when dealing with large datasets. Most of my repo is in bash and perl. Scaling up this project further will require a way to increase disk throughput beyond what mdadm on a single machine allows. Having increased funds would've also helped me completed this project sooner. It took maybe 6 months part-time, could've been less.

User info

Posts

Comments