Posts

"It's a 10% chance which I did 10 times, so it should be 100%" 2024-11-18T01:14:27.738Z
Is Text Watermarking a lost cause? 2024-10-01T16:20:51.113Z

Comments

Comment by egor.timatkov on "It's a 10% chance which I did 10 times, so it should be 100%" · 2024-11-18T20:56:07.229Z · LW · GW

Yeah, I suppose we agree then.

Comment by egor.timatkov on "It's a 10% chance which I did 10 times, so it should be 100%" · 2024-11-18T20:28:23.655Z · LW · GW

Sure: 
For a monogamous partner, finding a successful partner has a value of 1
Finding 2 successful partners also has a value of 1, because in a monogamous relationship, you only need one partner.
The same holds for 3, 4, etc partners. All those outcomes also have a value of 1.
So first, let's find the probability of getting a value of 0. Then let's calculate the probability of getting a value of 1.
The probability of getting a value of 0 (not finding a partner):

There is one other mutually exclusive alternative: Finding at least one partner (which has a value of 1)

So we have a 34.9% chance of getting a value of 0 and a 65.1% chance of getting a value of 1. The expected value is:

If you did this experiment a million times and assigned a value of 1 to "getting at least one monogamous partner" and a value of 0 to "getting no monogamous partners," you would get, on average, a reward of 0.651.

For the sake of brevity, I'll skip the calculations for a polygamous partner because we both agree on what the answer should be for that.

Comment by egor.timatkov on "It's a 10% chance which I did 10 times, so it should be 100%" · 2024-11-18T19:42:57.288Z · LW · GW

Ah, shoot. You're right. Probably not good to use "odds" and "probability" interchangeably for percentages like I did. Should be fixed now.

Comment by egor.timatkov on "It's a 10% chance which I did 10 times, so it should be 100%" · 2024-11-18T19:36:46.768Z · LW · GW

95% is a lower bound. It's more than 95% for all numbers and approaches 95% as n gets bigger. If n=2 (E.G. a coin flip), then you actually have a 98.4% chance of at least one success after 3n (which is 6) attempts.

I mentioned this in the "What I'm not saying" section, but this limit converges rather quickly. I would consider any  to be "close enough"

Comment by egor.timatkov on "It's a 10% chance which I did 10 times, so it should be 100%" · 2024-11-18T16:41:55.112Z · LW · GW

I think what Justin is saying is that finding a single monogamous partner is not significantly different from finding two, three, etc. For some things you only care about succeeding once. So a 63% chance of success (any number of times) means a .63 expected value (because all successes after the first have a value of 0).

Meanwhile for other things, such as polyamorous partners, 2 partners is meaningfully better than one, so the expected value truly is 1, because you will get one partner on average. (Though this assumes 2 partners is twice as good as one, we can complicate this even more if we assume that 2 partners is better, but not twice as good)

Comment by egor.timatkov on "It's a 10% chance which I did 10 times, so it should be 100%" · 2024-11-18T09:15:08.336Z · LW · GW

It's a great idea. I ended up bolding the one line that states my conclusion to make it easier to spot.

Comment by egor.timatkov on "It's a 10% chance which I did 10 times, so it should be 100%" · 2024-11-18T09:05:26.224Z · LW · GW

That's crazy how close that is.  (to the nearest half a percent) will be a fun fact that I remember now!

Comment by egor.timatkov on "It's a 10% chance which I did 10 times, so it should be 100%" · 2024-11-18T02:26:48.279Z · LW · GW

Haha, I didn't think of that. Funny.

Comment by egor.timatkov on "It's a 10% chance which I did 10 times, so it should be 100%" · 2024-11-18T01:52:09.865Z · LW · GW

<3

Comment by egor.timatkov on Is Text Watermarking a lost cause? · 2024-10-04T18:13:41.592Z · LW · GW

My guess is that it probably works, and it's useful to have, but I think the moment that it's made public in any way, people will break it pretty easily.

Comment by egor.timatkov on Is Text Watermarking a lost cause? · 2024-10-02T21:56:37.054Z · LW · GW

I haven't, no!
It seems interesting, I'll check it out

Comment by egor.timatkov on Is Text Watermarking a lost cause? · 2024-10-02T21:46:05.882Z · LW · GW

Wow. This is some really interesting stuff. Upvoting your comment.

Comment by egor.timatkov on Is Text Watermarking a lost cause? · 2024-10-02T18:17:45.162Z · LW · GW

The a-before-e example is just there to explain, in a human readable way, how a watermark works. The main important bit is that each individual section of the text is unlikely to occur according to some objective scale, be it a-before-e, or hashing mod 10, or some other way. 
I really like your example of hashing small bits of the text to 0 mod 10 though. I would have to look into how often you can actually edit text this way without significantly changing the meaning, but as soon as that's done, you can solve for an N and find how much text you need in order to determine the text is watermarked.

Comment by egor.timatkov on Is Text Watermarking a lost cause? · 2024-10-02T01:10:50.886Z · LW · GW

Yes, so indexing all generations is absolutely a viable strategy, though like you said, it might be more expensive.
Watermarking by choosing different tokens at a specific temperature might not be as effective (as you touched on), because in order to reverse that, you need the exact input. Even a slight change to the input or the context will shift the probability distribution over the tokens, after all. Which means you can't know if the LLM chose the first or second or third most probable token just by looking at the token.
That being said, something like this can still be used to watermark text: If the LLM has some objective, text-independent criteria for being watermarked (like the "e" before "a" example, or perhaps something more elaborate created using gradient descent), then you can use an LLM's loss function to choose some middle ground between maximizing your independent criteria and minimizing the loss function.
The ideal watermark would put markings into the meaning behind the text, not just the words themselves. No idea how that would happen, but in that way you could watermark an idea, and at that point hacks like "translate to French and back" won't work. Although watermarking the meaning behind text is currently, as far as I know, science fiction.

Comment by egor.timatkov on Is Text Watermarking a lost cause? · 2024-10-01T19:01:30.151Z · LW · GW

I haven't, no. I really wish I could somehow investigate all 3 pillars of a good watermark (Decisiveness, Invisibility, Robustness), but I couldn't think of any way to quantify a general text watermark's invisibility. For any given watermark you can technically rate "how invisible it is" by using an LLM's loss function to see how different the watermarked text is from the original text, but I can't come up with a way to generalize this.
So unfortunately my analysis was only about the interplay between decisiveness and robustness.