On Contact, Part 1

post by james.lucassen · 2025-01-21T03:10:54.429Z · LW · GW · 0 comments

Contents

  Basic Contact
  Example Contact
  Enjoyable Contact
  Homebrew Contact
  Ideal Contact
  Theoretical Contact
  Equilibrium Contact
  Algorithmic Contact
  Numerical Contact
  Invincible Contact
None
No comments

Context: for fun (and profit?)

Basic Contact

Contact is a lightweight many-versus-one word guessing game. I was first introduced to it on a long bus ride several years ago, and since then it’s become one of my favorite games to play casually with friends. There are a few blog posts out there about contact, but I think it’s incredibly underrated.

The rules of contact are simple, but I often tell people it’s easier to learn by watching others play rather than by a verbal explanation of the rules. Nevertheless, here is a verbal explanation of the rules.

Example Contact

We join our players in media res, at the start of the second round.

Enjoyable Contact

There are a few important rules to actually have fun playing this game, besides the hard rules of the game itself.

Homebrew Contact

There are various “house rules” you can add to contact to get a slightly different experience.

Ideal Contact

If you play for a little while, you will probably notice the following pattern: sometimes a defender will pick a rare or unusual word, the first round will proceed as normal, and then after the second or third letters are revealed the word will be entirely obvious. “Syzygy” is a particularly egregious example: after S-Y is revealed, the game is almost over. S-Y-Z is guaranteed over.

This raises an interesting question. I’ve talked a bit about what makes for the most fun contact word. But what makes for a highly defensible contact word?

Intuitively, a defensible word should be long, so the defender has more “lives” before the last letter of the word is revealed. It should also be free of giveaway prefixes, such as “syz”. It should also still have some kind of “rare or unusual” quality, to make it less likely that attackers will guess it.

A pretty good strategy for generating defensible words is to pick a common prefix (“un”) or even a long common prefix if you can (“inter”) and then a rare completion (“interregnum”)[2]. In fact, prior work on the topic discusses this exact strategy, and helpfully identifies the most common prefixes in English.

But I think we can do better than that. Key questions I’m interested in:

  1. The “common prefix + rare suffix” approach is obviously a clunky two-step approximation to some cleaner thing. What is that thing?
  2. A static list of “most defensible contact words” would be nice, but isn’t quite the thing I’m looking for. If my friends know I always pick one of the same ten words when defending, that makes the game easier for them, not harder. What’s the most defensible distribution when the attackers know the distribution?

Theoretical Contact

First things first. If we’re looking for the “most defensible” words/distributions, we need some precise setting, so that we can actually try and prove things about how different words/distributions perform, and whether they are optimal. This is the setting I’ll be starting with:

I’m interested in both the “black-box” case where the attackers don’t know the defender’s word choice distribution, and the “white-box” case where they do.

With just these assumptions, we can already identify some features of optimal play. The “no re-using clues” rule becomes irrelevant, since that’s only included because humans have a harder time coming up with a new contact clue than re-using an old one. Our theoretical perfect attackers have no problem coming up with new clues, and so they would never re-use a clue, because using a new word comes with some chance of the new word being the secret word and winning on the spot.

When p_block=1, the defender doesn’t have to worry about losing by giving away letters. The only lose condition left is that the attackers randomly guess the secret word, while still having only the first letter. The optimal strategies here are pretty obvious. Black-box, the defender chooses the most common first letter[4], and then chooses any distribution they want over words starting with that letter. White-box, the defender chooses the most common first letter and then is forced to choose the secret word uniformly randomly over the words that start with that letter.

Is the p_block=0 case similarly easy? Let’s examine the black-box case first. Assuming the vocabulary is reasonably large relative to its maximum depth, running out of letters becomes a much more pressing concern than the attackers guessing the word at random. So should the defender just choose the longest word in the vocabulary? If we’re willing to assume that every subtree in the vocabulary has a number of leaf nodes much greater than its depth, then yes, pretty much. If we’re not willing to assume that there are problems[5], but not problems I’m particularly interested in right now.

I’m mainly interested in the white-box case, where the defender can’t afford to concentrate their distribution so much for fear of the attackers guessing the secret word. For example, deterministically picking the longest word will get sniped and lose immediately! The defender has to compromise somehow between concentrating their distribution on long words and spreading the distribution out across different words. What is the best way to make that compromise?

Equilibrium Contact

I’m not particularly familiar with whatever subfield of game theory deals with random strategies in this kind of white-box case, I just vaguely know it exists. And frankly it seems more exciting to try and invent stuff myself than to do a literature review right now. So let me try and do that.

One hunch I have is that for the optimal distribution, each word in the distribution should have an equal marginal contribution to the risk of losing. I’d say something like “the derivative of expected lifetime with respect to the probability on a given word should be the same for all words with nonzero probability”, but there’s a problem with taking the derivative w.r.t the probability mass placed on particular value, since the distribution has to sum to 1. So we have some work to do in getting this intuition to something formal.

The natural place to start is writing down the defender’s expected lifetime at a given game state, so we can talk about “marginal contribution to risk of losing”. We’ll consider the game state before the attackers submit their guess, and we’ll count the guess which matches the secret word as part of the defender’s score. In other words, every defensive policy has an expected lifetime of at least one, except for the the defender with an empty vocabulary who could have a lifetime of 0 if you want to think about it that way.

When the attackers submit their guess, there’s a chance the attackers will guess the secret word, in which case the defender’s lifetime is 1. If the attackers don’t guess the secret word, then the defender survives for that turn, plus the expected lifetime of the game state with one more letter revealed.

E(lifetime | secret, prefix) = p(guess secret | prefix) + (1-p(guess secret | prefix))(1+E(lifetime | secret, prefix++))

If the attackers know the defender’s distribution and are guessing optimally, then they always guess the word that the defender puts the most probability on, of the words remaining given the prefix revealed so far.

p(guess secret | prefix) = max(def_dist(secret | prefix))

E(lifetime | secret, prefix) = max(def_dist(secret | prefix)) + (1-max(def_dist(secret | prefix))(1+E(lifetime | secret, prefix++))

We now have the function for the expected lifetime of a given state expressed in terms of itself and the defender’s distribution. This should be all we need to identify the ideal distribution! But unfortunately, the relationship between E(lifetime | secret, prefix) and E(lifetime | secret, prefix++) is gnarly and complicated, because it has to do with the real life English language. We’re not going to be able to get a nice closed form solution for E(lifetime | secret, prefix) or for the optimal distribution. But we can write an algorithm to get it, given a vocabulary.

Algorithmic Contact

The naive way of doing this involves working backwards through all prefixes of valid words, from most complete to least complete (bottom-up level traversal). This is DP, if you want to think of it that way, with a (length of longest valid word)-dimensional DP table.

If the prefix under consideration is a leaf of the prefix tree, from there the expected lifetime is 1. If it’s not a leaf, then because we’re doing a bottom-up level traversal we should already have access to the expected lifetimes of each child prefix, E(lifetime | secret, prefix++). Now we have all the parts we need to maximize E(lifetime | secret, prefix). How do we actually do that?

The form of what we’re trying to do here is take some pair of vectors <L>, <M> and optimize some vector <P> to maximize F = (1+dot(<L>, <P>))(1-max(hadamard[6](<M>,<P>))). <L> is the expected lifetime of each child, <M> is the maximum probability on any word given by each child’s distribution, and <P> are the probabilities we assign to each child. ChatGPT o1-preview agrees with me that this probably isn’t going to admit a closed-form solution, and suggests numerical optimization.

The gain (opposite of loss bc we’re maximizing) landscape for this optimization has a few discernible properties. Everything should be continuous, but the max is going to make the gain non-differentiable on the manifolds where max(hadamard(<M>,<P>)) is switching from being determined by one parameter to another. In regions where a given parameter P_i is not determining the max, the gain landscape will be linear with slope L_i with respect to that parameter. Problem is, this structure is all in the basis dimensions, and our optimization has to live on the subspace where the parameters sum to 1[7].

Without that structure, my geometric intuition isn’t feeling up to the task of trying to write the optimization algorithm myself in some insightful way. Looks like we’re out of theory runway, and it’s on to the programming part of this post[8].

Numerical Contact

Here’s a function for the expected lifetime of a game state:

def life(probabilities, lifetimes, maxes):
    max_p = np.max(np.multiply(maxes, probabilities))
    return max_p + (1 - max_p) * (1 + np.dot(lifetimes, probabilities.T))

We just have to wrap that and negate it, since scipy wants a loss to minimize instead of maximize.

def make_objective(lifetimes, maxes):
    def objective(probabilities):
        return -life(probabilities, lifetimes, maxes)
    return objective

def optimize_probabilities(lifetimes, maxes):
    objective = make_objective(lifetimes, maxes)
    prob_constraint = [{'type':'eq', 'fun':lambda p : np.sum(p) - 1}]
    prob_bounds = [(0, 1) for _ in range(lifetimes.shape[0])]
    init = np.array([1/lifetimes.shape[0]]*lifetimes.shape[0])
    result = opt.minimize(objective,
                          init,
                          bounds=prob_bounds,
                          constraints=prob_constraint)
    return result.x, -result.fun

Ok, great. Now, what do we actually do with that? Well, the other part of our algorithm is doing a backwards level traversal over the tree of all prefixes to valid words. So let’s set that up:

def analyze_vocabulary(path):
    prefix_trie = {}
    with open(path) as word_file:
        for word in word_file.readlines():
            position = prefix_trie
            for letter in word.strip():
                if letter not in position:
                    position[letter] ={}
                position = position[letter]
            position['$'] = {}
    return prefix_trie

And now we have everything we need. We know that a leaf node has a lifetime of 1 and a maximum probability of 1. If we have all of the lifetimes and maximum probabilities for the children of a given prefix, we can find the optimal probabilities to assign to each child to maximize the lifetime of that prefix. And from those probabilities, we can calculate the lifetime of the prefix, as well as its maximum probability. Then we just proceed backwards, until we’ve calculated the whole optimal conditional distribution, from which we can extract the distribution over valid words when starting from the root.

Invincible Contact

Well, I’m having problems with my optimization and I want to get this post out while it still feels fun and motivating. Hopefully I’ll be back sometime with a Part 2, complete with a debugged script for sampling from the Theoretically Perfect Defense Distribution for the n most common English words.

For now, I’ll leave off with a list of open problems in Contact Studies:

  1. Optimal black-box defense without vocabulary density assumption
  2. Factor in defense probability
  3. Factor in effects of guesses eliminating possibilities
  4. Approximate effects of more obscure words on guess time / guess probability
  1. ^

    “cork”.

  2. ^

    Fun management footnote: for defensibility it’s tempting to tack on stuff like “un”, “super”, “ing”, “able” whenever you can, but this can make gameplay feel repetitive.

  3. ^

    Note that this throws out any effects from the secret word being obscure or hard to think of, which is an important effect when playing contact against humans. 

  4. ^

    The cleaner version is to play without the defender revealing the first letter of their word, and then the “optimal black-box strategy” in this case is picking literally any word.

  5. ^

    Consider the vocabulary [AA, AB, CCCCC]. If the defender survives the guess and the secret word is CCCCC, they will have to reveal a C, which tells the attackers exactly what the secret word is, so they can just guess it next turn, and the extra length doesn’t really matter. This only happens because the in the prefix tree “CCCCC” is a stick, allowing the attackers to get lots of information about the leaves of that subtree before the prefix has reached that point. If we add in the assumption that every subtree has a large number of leaves relative to its depth, then the attackers can never get much information lead over the revealed prefix, and picking one of the longest words is optimal (there can’t be a single longest word, by assumption).

  6. ^

    This is apparently what The Math People call element-wise multiplication of vectors/matrices. I use it here because element_wise_product(<M>,<P>)) felt a bit too clunky, element_wise(<M>,<P>) wasn’t very clear, and hey if you’re reading this footnote you learned something extra today.

  7. ^

    I figured there would probably be a name for this and there is, it’s called the standard simplex.

  8. ^

0 comments

Comments sorted by top scores.