What is space? What is time? 2024-06-07T22:15:55.951Z
Some perspectives on the discipline of Physics 2024-05-20T18:19:22.429Z

Be careful. Physics seems to be translation invariant, but space is not. You can drop the ball in and out of the cave and its displacement over time will be the same, but you can definitely tell whether it is in the cave or out of the cave. You can set your zero point anywhere, but that doesn’t mean that objects in space move when you change your zero point. Space is isotropic. There’s no discernible difference between upward, sideways, or diagonal, but if you measure the sideways distance between two houses to be 40 meters, a person who called your “sideways” their “up” will measure the distance between the houses to be 40 meters up and down. You can do everything here as you can do there, but here is not there. In the absence of any reference point, no point in space is different from any other point, but in the absence of any reference point there’s no need for physics, because if there was anything to describe with physics, you could use it as a reference point.

I suppose you could try to define space as the thing you can move around in without changing your physics, but the usual strategy is to define physics and derive conservation of momentum from the fact that your physics is translation invariant.

The main idea seems good: if you're in a situation where you think you might be in the process of being deceived by an AI, do not relax when the AI provides great evidence that it is not deceiving you. The primary expected outputs of something really good at deception should be things which don't look like deception.

Some of the things in the post don't seem general enough to me, so I want to try to restate them.

Test 1 I like. If you understand all of the gears, you should understand the machine.

Test 2 I like. Tweak the model in a way that should make it worse at process A, and then check the outputs of process A to see whether they change in ways that reveal biases you can triangulate with past outputs in order to catch deception.

Test 3 scares me. This seems to me to be a test for the very specific form of deception "hide capabilities while in testing but not when deployed" which seems like a small domain in deception space.

Test 4 I kind of like. One expected behavior from a deceptive AI is to hide capabilities that scare people studying them. If you see scary capabilities getting less scary, you should take that as a sign of deception unless you have good reason to believe those capabilities should get worse with scale. Thus it is a good idea to find out which things should get worse with scale ahead of time. I do worry that this paradigm relies too much on AI which improves via "more dakka" (eg more GPUs, larger datasets, better processors, etc) rather than via algorithm improvements or something, in which case I don't know that people will have a good handle on what capabilities will get worse. The "scaling helps" section also worries me for this reason.

In the section "deceptive models know this" you suggest "deciding on a level of deceptive capabilities that’s low enough that we trust models not to be deceptively aligned". Won't that just optimize on things which start deceiving well earlier? I think I may be misinterpreting what you mean by "deceptive capabilities" here. Maybe your "deceptive capabilities" are "smoke" and actual deception is "fire", but I'm not sure what deceptive capabilities that aren't deception are.

The ad market amounts to an auction for societal control. An advertisement is an instrument by which an entity attempts to change the future behavior of many other entities. Generally it is an instrument for a company to make people buy their stuff. There is also political advertising, which is an instrument to make people take actions in support of a cause or person seeking power. Advertising of any type is not known for making reason-based arguments. I recall in an interview with the author that this influence/prediction market was a major objection to the new order. If there is to be a market where companies and political-power-seekers bid for the ability to change the actions of the seething masses according to their own goals, the author felt that the seething masses should have some say in it.