Posts

Comments

Comment by KhromeM (khromem) on Using an LLM perplexity filter to detect weight exfiltration · 2024-07-22T22:33:51.579Z · LW · GW

My last statement was totally wrong. Thanks for catching that.

In theory its probably even possible to get the approximate weights by expending insane amounts of compute, but you could use those resources much more efficiently. 

Comment by KhromeM (khromem) on Using an LLM perplexity filter to detect weight exfiltration · 2024-07-22T00:35:02.299Z · LW · GW

I do not understand how you can extract weights through just conversing with an LLM any more than you can get information on how my neurons are structured by conversing with me. Extracting training data it has seen is one thing, but presumably it has never seen its weights. If the system prompts did not tell it it was an LLM, it should not even be able to figure out that.