Posts

5 Physics Problems 2024-03-18T08:05:45.971Z
Coalescer Models 2024-01-17T06:39:30.102Z
Embedded Agents are Quines 2023-12-12T04:57:31.588Z
Logical Share Splitting 2023-09-11T04:08:32.350Z
DaemonicSigil's Shortform 2023-08-20T22:58:57.385Z
Problems with Robin Hanson's Quillette Article On AI 2023-08-06T22:13:43.654Z
Contra Anton 🏴‍☠️ on Kolmogorov complexity and recursive self improvement 2023-06-30T05:15:38.701Z
Time and Energy Costs to Erase a Bit 2023-05-06T23:29:42.909Z
The Brain is Not Close to Thermodynamic Limits on Computation 2023-04-24T08:21:44.727Z
Self-censorship is probably bad for epistemology. Maybe we should figure out a way to avoid it? 2023-03-19T09:04:42.360Z
Gatekeeper Victory: AI Box Reflection 2022-09-09T21:38:39.218Z
Alien Message Contest: Solution 2022-07-13T04:07:06.010Z
Contest: An Alien Message 2022-06-27T05:54:54.144Z
Tessellating Hills: a toy model for demons in imperfect search 2020-02-20T00:12:50.125Z

Comments

Comment by DaemonicSigil on Any evidence or reason to expect a multiverse / Everett branches? · 2024-04-24T22:16:27.170Z · LW · GW

So once that research is finished, assuming it is successful, you'd agree that many worlds would end up using fewer bits in that case? That seems like a reasonable position to me, then! (I find the partial-trace kinds of arguments that people make pretty convincing already, but it's reasonable not to.)

Comment by DaemonicSigil on Any evidence or reason to expect a multiverse / Everett branches? · 2024-04-12T19:57:07.011Z · LW · GW

MW theories have to specify when and how decoherence occurs. Decoherence isn't simple.

They don't actually. One could equally well say: "Fundamental theories of physics have to specify when and how increases in entropy occur. Thermal randomness isn't simple." This is wrong because once you've described the fundamental laws and they happen to be reversible, and also aren't too simple, increasing entropy from a low entropy initial state is a natural consequence of those laws. Similarly, decoherence is a natural consequence of the laws of quantum mechanics (with a not-too-simple Hamiltonian) applied to a low entropy initial state.

Comment by DaemonicSigil on Ackshually, many worlds is wrong · 2024-04-11T20:38:27.399Z · LW · GW

Good post, and I basically agree with this. I do think it's good to mostly focus on the experimental implications when talking about these things. When I say "many worlds", what I primarily mean is that I predict that we should never observe a spontaneous collapse, even if we do crazy things like putting conscious observers into superposition, or putting large chunks of the gravitational field into superposition. So if we ever did observe such a spontaneous collapse, that would falsify many worlds.

Comment by DaemonicSigil on Any evidence or reason to expect a multiverse / Everett branches? · 2024-04-11T20:23:58.772Z · LW · GW

Amount of calculation isn't so much the concern here as the amount of bits used to implement that calculation. And there's no law that forces the amount of bits encoding the computation to be equal. Copenhagen can just waste bits on computations that MWI doesn't have to do.

In particular, I mentioned earlier that Copenhagen has to have rules for when measurements occur and what basis they occur in. How does MWI incur a similar cost? What does MWI have to compute that Copenhagen doesn't that uses up the same number of bits of source code?

Like, yes, an expected-value-maximizing agent that has a utility function similar to ours might have to do some computations that involve identifying worlds, but the complexity of the utility function doesn't count against the complexity of any particular theory. And an expected value maximizer is naturally going to try and identify its zone of influence, which is going to look like a particular subset of worlds in MWI. But this happens automatically exactly because the thing is an EV-maximizer, and not because the laws of physics incurred extra complexity in order to single out worlds.

Comment by DaemonicSigil on Any evidence or reason to expect a multiverse / Everett branches? · 2024-04-11T01:54:07.185Z · LW · GW

Right, so we both agree that the randomness used to determine the result of a measurement in Copenhagen, and the information required to locate yourself in MWI is the same number of bits. But the argument for MWI was never that it had an advantage on this front, but rather that Copenhagen used up some extra bits in the machine that generates the output tape in order to implement the wavefunction collapse procedure. (Not to decide the outcome of the collapse, those random bits are already spoken for. Just the source code of the procedure that collapses the wavefunction and such.) Such code has to answer questions like: Under what circumstances does the wavefunction collapse? What determines the basis the measurement is made in? There needs to be code for actually projecting the wavefunction and then re-normalizing it. This extra complexity is what people mean when they say that collapse theories are less parsimonious/have extra assumptions.

Comment by DaemonicSigil on Any evidence or reason to expect a multiverse / Everett branches? · 2024-04-10T18:12:07.454Z · LW · GW

Disagree.

If you're talking about the code complexity of "interleaving": If the Turing machine simulates quantum mechanics at all, it already has to "interleave" the representations of states for tiny things like a electrons being in a superposition of spin states or whatever. This must be done in order to agree with experimental results. And then at that point not having to put in extra rules to "collapse the wavefunction" makes things simpler.

If you're talking about the complexity of locating yourself in the computation: Inferring which world you're in is equally complex to inferring which way all the Copenhagen coin tosses came up. It's the same number of bits. (In practice, we don't have to identify our location down to a single world, just as we don't care about the outcome of all the Copenhagen coin tosses.)

Comment by DaemonicSigil on General Thoughts on Secular Solstice · 2024-03-24T18:39:28.981Z · LW · GW

This notion of faith seems like an interesting idea, but I'm not 100% sure I understand it well enough to actually apply it in an example.

Suppose Descartes were to say: "Y'know, even if there were an evil Daemon fooling every one of my senses for every hour of the day, I can still know what specific illusions the Daemon is choosing to show me. And hey, actually, it sure does seem like there are some clear regularities and patterns in those illusions, so I can sometimes predict what the Daemon will show me next. So in that sense it doesn't matter whether my predictions are about the physical laws of a material world, or just patterns in the thoughts of an evil being. My mental models seem to be useful either way."

Is that what faith is?

If a rationalist hates the idea of heat death enough that they fool themselves into thinking that there must be some way that the increase in entropy can be reversed, is that an example of not seeing the world as it is? How does this flow from a lack of the first thing?

Comment by DaemonicSigil on "Deep Learning" Is Function Approximation · 2024-03-24T02:36:18.486Z · LW · GW

To be clear, I'm definitely pretty sympathetic to TurnTrout's type error objection. (Namely: "If the agent gets a high reward for ingesting superdrug X, but did not ingest it during training, then we shouldn't particularly expect the agent to want to ingest superdrug X during deployment, even if it realizes this would produce high reward.") But just rereading what Zack has written, it seems quite different from what TurnTrout is saying and I still stand by my interpretation of it.

  • eg. Zack writes: "obviously the line itself does not somehow contain a representation of general squared-error-minimization". So in this line fitting example, the loss function, i.e. "general squared-error-minimization" refers to the function , and not .
  • And when he asks why one would even want the neural network to represent the loss function, there's a pretty obvious answer of "well, the loss function contains many examples of outcomes humans rated as good and bad and we figure it's probably better if the model understands the difference between good and bad outcomes for this application." But this answer only applies to the curried loss.

I wasn't trying to sign up to defend everything Eliezer said in that paragraph, especially not the exact phrasing, so can't reply to the rest of your comment which is pretty insightful.

Comment by DaemonicSigil on "Deep Learning" Is Function Approximation · 2024-03-22T02:22:14.224Z · LW · GW

It's the same thing for piecewise-linear functions defined by multi-layer parameterized graphical function approximators: the model is the dataset. It's just not meaningful to talk about what a loss function implies, independently of the training data. (Mean squared error of what? Negative log likelihood of what? Finish the sentence!)

This confusion about loss functions...

I don't think this is a confusion, but rather a mere difference in terminology. Eliezer's notion of "loss function" is equivalent to Zack's notion of "loss function" curried with the training data. Thus, when Eliezer writes about the network modelling or not modelling the loss function, this would include modelling the process that generated the training data.

Comment by DaemonicSigil on Deconstructing Bostrom's Classic Argument for AI Doom · 2024-03-12T05:31:28.596Z · LW · GW

Could you give an example of knowledge and skills not being value neutral?

(No need to do so if you're just talking about the value of information depending on the values one has, which is unsurprising. But it sounds like you might be making a more substantial point?)

Comment by DaemonicSigil on Counting arguments provide no evidence for AI doom · 2024-02-28T00:56:46.169Z · LW · GW

Fair enough for the alignment comparison, I was just hoping you could maybe correct the quoted paragraph to say "performance on the hold-out data" or something similar.

(The reason to expect more spread would be that training performance can't detect overfitting but performance on the hold-out data can. I'm guessing some of the nets trained in Miller et al did indeed overfit (specifically the ones with lower performance).)

Comment by DaemonicSigil on Counting arguments provide no evidence for AI doom · 2024-02-28T00:18:52.239Z · LW · GW

More generally, John Miller and colleagues have found training performance is an excellent predictor of test performance, even when the test set looks fairly different from the training set, across a wide variety of tasks and architectures.

Seems like figure 1 from Miller et al is a plot of test performance vs. "out of distribution" test performance. One might expect plots of training performance vs. "out of distribution" test performance to have more spread.

Comment by DaemonicSigil on DaemonicSigil's Shortform · 2024-02-19T21:04:56.235Z · LW · GW

In this context, we're imitating some probability distribution, and the perturbation means we're slightly adjusting the probabilities, making some of them higher and some of them lower. The adjustment is small in a multiplicative sense not an additive sense, hence the use of exponentials. Just as a silly example, maybe I'm training on MNIST digits, but I want the 2's to make up 30% of the distribution rather than just 10%. The math described above would let me train a GAN that generates 2's 30% of the time.

I'm not sure what is meant by "the difference from a gradient in SGD", so I'd need more information to say whether it is different from a perturbation or not. But probably it's different: perturbations in the above sense are perturbations in the probability distribution over the training data.

Comment by DaemonicSigil on DaemonicSigil's Shortform · 2024-02-19T04:14:16.969Z · LW · GW

Perturbation Theory in Machine Learning

Linkpost for: https://pbement.com/posts/perturbation_theory.html

In quantum mechanics there is this idea of perturbation theory, where a Hamiltonian is perturbed by some change to become . As long as the perturbation is small, we can use the technique of perturbation theory to find out facts about the perturbed Hamiltonian, like what its eigenvalues should be.

An interesting question is if we can also do perturbation theory in machine learning. Suppose I am training a GAN, a diffuser, or some other machine learning technique that matches an empirical distribution. We'll use a statistical physics setup to say that the empirical distribution is given by:

Note that we may or may not have an explicit formula for . The distribution of the perturbed Hamiltonian is given by:

The loss function of the network will look something like:

Where are the network's parameters, and is the per-sample loss function which will depend on what kind of model we're training. Now suppose we'd like to perturb the Hamiltonian. We'll assume that we have an explicit formula for . Then the loss can be easily modified as follows:

If the perturbation is too large, then the exponential causes the loss to be dominated by a few outliers, which is bad. But if the perturbation isn't too large, then we can perturb the empirical distribution by a small amount in a desired direction.

One other thing to consider is that the exponential will generally increase variance in the magnitude of the gradient. To partially deal with this, we can define an adjusted batch size as:

Then by varying the actual number of samples we put into a batch, we can try to maintain a more or less constant adjusted batch size. One way to do this is to define an error variable, err = 0. At each step, we add a constant B_avg to the error. Then we add samples to the batch until adding one more sample would cause the adjusted batch size to exceed err. Subtract the adjusted batch size from err, train on the batch, and repeat. The error carries over from one step to the next, and so the adjusted batch sizes should average to B_avg.

Comment by DaemonicSigil on Mapping the semantic void: Strange goings-on in GPT embedding spaces · 2024-02-17T03:11:44.623Z · LW · GW

I don't think we should consider the centroid important in describing the LLM's "ontology". In my view, the centroid just points in the direction of highest density of words in the LLM's space of concepts. Let me explain:

The reason that embeddings are spread out is to allow the model to distinguish between words. So intuitively, tokens with largeish dot product between them correspond to similar words. Distinguishability of tokens is a limited resource, so the training process should generally result in a distribution of tokens that uses this resource in an efficient way to encode the information needed to predict text. Consider a language with 100 words for snow. Probably these all end up with similar token vectors, with large dot products between them. Exactly which word for snow someone writes is probably not too important for predicting text. So the training process makes those tokens relatively less distinguishable from each other. But the fact that there are a 100 tokens all pointing in a similar direction means that the centroid gets shifted in that direction.

Probably you can see where this is going now. The centroid gets shifted in directions where there are many tokens that the network considers to be all similar in meaning, directions where human language has allocated a lot of words, while the network considers the differences in shades of meaning between these words to be relatively minor.

Comment by DaemonicSigil on Dreams of AI alignment: The danger of suggestive names · 2024-02-13T07:17:17.579Z · LW · GW

Mathematically, convergence just means that the distance to some limit point goes to 0 in the limit. There's no implication that the limit point has to be unique, or optimal. Eg. in the case of Newton fractals, there are multiple roots and the trajectory converges to one of the roots, but which one it converges to depends on the starting point of the trajectory. Once the weight updates become small enough, we should say the net has converged, regardless of whether it achieved the "optimal" loss or not.

If even "converged" is not good enough, I'm not sure what one could say instead. Probably the real problem in such cases is people being doofuses, and probably they will continue being doofuses no matter what word we force them to use.

Comment by DaemonicSigil on Dreams of AI alignment: The danger of suggestive names · 2024-02-11T20:30:25.832Z · LW · GW

On the actual object level for the word "optimal", people already usually say "converged" for that meaning and I think that's a good choice.

Comment by DaemonicSigil on And All the Shoggoths Merely Players · 2024-02-11T20:17:13.540Z · LW · GW

Relatedly, you bring up adversarial examples in a way that suggests that you think of them as defects of a primitive optimization paradigm, but it turns out that adversarial examples often correspond to predictively useful features that the network is actively using for classification, despite those features not being robust to pixel-level perturbations that humans don't notice—which I guess you could characterize as "weird squiggles" from our perspective, but the etiology of the squiggles presents a much more optimistic story about fixing the problem with adversarial training than if you thought "squiggles" were an inevitable consequence of using conventional ML techniques.

Train two distinct classifier neural-nets on an image dataset. Set aside one as the "reference net". The other net will be the "target net". Now perturb the images so that they look the same to humans, and also get classified the same by the reference net. So presumably both the features humans use to classify, and the squiggly features that neural nets use should be mostly unchanged. Under these constraints on the perturbation, I bet that it will still be possible to perturb images to produce adversarial examples for the target net.

Literally. I will bet money that I can still produce adversarial examples under such constraints if anyone wants to take me up on it.

Comment by DaemonicSigil on DaemonicSigil's Shortform · 2024-02-04T08:50:40.963Z · LW · GW

You Can Just Put an Endpoint Penalty on Your Wasserstein GAN

Linkpost for: https://pbement.com/posts/endpoint_penalty.html

When training a Wasserstein GAN, there is a very important constraint that the discriminator network must be a Lipschitz-continuous function. Roughly we can think of this as saying that the output of the function can't change too fast with respect to position, and this change must be bounded by some constant . If the discriminator function is given by then we can write the Lipschitz condition for the discriminator as:

Usually this is implemented as a gradient penalty. People will take a gradient (higher order, since the loss already has a gradient in it) of this loss (for ):

In this expression is sampled as , a random mixture of a real and a generated data point.

But this is complicated to implement, involving a higher order gradient. It turns out we can also just impose the Lipschitz condition directly, via the following penalty:

Except to prevent issues where we're maybe sometimes dividing by zero, we throw in an and a reweighting factor of (not sure if that is fully necessary, but the intuition is that making sure the Lipschitz condition is enforced for points at large separation is the most important thing).

For the overall loss, we compare all pairwise distances between real data and generated data and a random mixture of them. Probably it improves things to add 1 or two more random mixtures in, but I'm not sure and haven't tried it.

In any case, this seems to work decently well (tried on mnist), so it might be a simpler alternative to gradient penalty. I also used instance noise, which as pointed out here, is amazingly good for preventing mode collapse and just generally makes training easier. So yeah, instance noise is great and you should use it. And if you really don't want to figure out how to do higher order gradients in pytorch for your WGAN, you've still got options.

Comment by DaemonicSigil on My thoughts on the Beff Jezos - Connor Leahy debate · 2024-02-04T01:59:15.048Z · LW · GW

Yes. I think Beff was speaking imprecisely there. In order to be consistent with what he's written elsewhere, he should have said something like: "maximizing the rate of free energy dissipation".

Comment by DaemonicSigil on My thoughts on the Beff Jezos - Connor Leahy debate · 2024-02-03T23:21:09.370Z · LW · GW

C: You heard it, e/acc isn't about maximizing entropy [no shit?!]

B: No, it's about maximizing the free energy

C: So e/acc should want to collapse the false vacuum?

Holy mother of bad faith. Rationalists/lesswrongers have a problem with saying obviously false things, and this is one of those.

It's in line with what seems like Connor's debate strategy - make your opponent define their views and their terminal goal in words, and then pick apart that goal by pushing it to the maximum. Embarrassing.

I agree with you that Connor performed very poorly in this debate. But this one is actually fair game. If you look at Beff's writings about "thermodynamic god" and these kinds of things, he talks a lot about how these ideas are supported by physics and the Crooks fluctuation theorem. Normally in a debate if someone says they value X, you interpret that as "I value X, but other things can also be valuable and there might be edge cases where X is bad and I'm reasonable and will make exceptions for those."

But physics doesn't have a concept of "reasonable". The ratio between the forward and backward probabilities in the Crooks fluctuation theorem is exponential in the amount of entropy produced. It's not exponential in the amount of entropy produced plus some correction terms to add in reasonable exceptions for edge cases. Given how much Beff has emphasized that his ideas originated in physics, I think it's reasonable to take him at his word and assume that he really is talking about the thing in the exponent of the Crooks fluctuation theorem. And then the question of "so hey, it sure does look like collapsing the false vacuum would dissipate an absolutely huge amount of free energy" is a very reasonable one to ask.

Comment by DaemonicSigil on cold aluminum for medicine · 2023-12-19T19:56:38.177Z · LW · GW

If you care about the heat coming out on the hot side rather than the heat going in on the cold side (i.e. the application is heat pump rather than refrigerator), then the theoretical limit is always greater than 1, since the work done gets added onto the heat absorbed:

Cooling performance can absolutely be less than 1, and often is for very cold temperatures.

Comment by DaemonicSigil on cold aluminum for medicine · 2023-12-17T10:27:26.199Z · LW · GW

a few kW of resistive loss

Is this already accounting for the energy penalty of cooling at cryogenic temperatures? 20K to room temperature is more than a factor of 10. You pay the energy cost once in resistive losses and 10 times in pumping the generated entropy out of the cold bath. I guess the electricity bill is not a huge constraint on these things, but it could mean a higher cost for cooling equipment?

Comment by DaemonicSigil on re: Yudkowsky on biological materials · 2023-12-11T20:39:21.674Z · LW · GW

In general, the factors that govern the macroscopic strength of materials can often have surprisingly little to do with the strength of the bonds holding them together. A big part of a material's tensile strength is down to whether it forms cracks and how those cracks propagate. I predict many LWers would enjoy reading The New Science of Strong Materials which is an excellent introduction to materials science and its history. (Cellulose is mentioned, and the most frequent complaint about it as an engineering material lies in its tendency to absorb water.)

It's actually not clear to me why Yudkowsky thinks that ridiculously high macroscopic physical strength is so important for establishing an independent nanotech economy. Is he imagining that trees will be out-competed by solar collectors rising up on stalks of diamond taller than the tallest tree trunk? But the trees themselves can be consumed for energy, and to achieve this, nanobots need only reach them on the ground. Once the forest has been eaten, a solar collector lying flat on the ground works just as well. One legitimate application for covalently bonded structures is operating at very higher temperatures, which would cause ordinary proteins to denature. In those cases the actual strength of the individual bonds does matter more.

Comment by DaemonicSigil on Why Yudkowsky is wrong about "covalently bonded equivalents of biology" · 2023-12-07T08:54:35.180Z · LW · GW

Another effect that is very important is determining how proteins fold is the fact that they're dissolved in liquid water, and so hydrophilic parts of the protein want to be on the surface, while hydrophobic parts want to be on the inside, near other hydrophobic parts. This is largely an entropic force/effect.

Some other things that are true:

  • 100% of the bonds in hydrogen gas are covalent.
  • Most of the fundamental particles in a water molecule are held together by the strong nuclear force, which is a much stronger binding than covalent bonds.
  • If you pull on a protein and stretch it apart, it looks like a long chain (maybe with a few crosslinks).
    • This is a non-zero amount of structure, but it looks nothing like the fully folded protein.
    • If we ask how the chain with the crosslinks is held together, the answer is covalent bonds that were either in the amino acids originally, or were formed when the ribosome assembled them into a chain, (or in the case of the disulfide crosslinks, were formed during the folding process).
    • But if we ask where the rest of the protein's structure came from, then the answer is hydrogen bonds and hydrophobic/hydrophilic forces.
Comment by DaemonicSigil on Why not electric trains and excavators? · 2023-11-22T02:03:57.595Z · LW · GW

The diesel just drives the "generator" that then powers electric motors that drive the wheels.

That's exactly what "electric transmission" means, no?

Comment by DaemonicSigil on Why not electric trains and excavators? · 2023-11-21T05:34:41.100Z · LW · GW

Wait, diesel-electric just means that they use an electric transmission, right? So 100% of the energy driving the locomotive still ultimately comes from burning diesel. IIRC the carbon footprint of electric cars is dependent on how your local power is generated. To be worse than internal combustion, there needs to be a high fraction of coal in the mix. Even the power plants that burn stuff are generally more efficient than internal combustion engines because they're larger so less heat is lost to conduction and they also burn hotter. So the actual reason for higher emissions would just be that coal has more carbon in it per joule than gasoline does. That's all just going off of memory, please correct me if I'm wrong.

It actually seems like a diesel-electric fleet would be almost ideal for converting rail lines to electric. If upgrading a locomotive to have brushes and some associated power electronics is not too expensive, then you can get a hybrid that will still operate as a normal diesel locomotive on lines that haven't been electrified yet, but will operate electrically on lines that have been, saving on fuel costs.

Comment by DaemonicSigil on Muireall's Shortform · 2023-11-09T06:16:38.162Z · LW · GW

Good point, I had briefly thought of this when answering, and it was the reason I mentioned constant factors in my comment. However, on closer inspection:

  1. The "constant" factor is actually only nearly constant.

  2. It turns out to be bigger than 10.

Explanation:

10^{-9} is about 6 sigma. To generalize, let's say we have sigma, where is some decently large number so that the position-only Boltzmann distribution gives an extremely tiny probability of error.

So we have the following probability of error for the position-only Boltzmann distribution:

Our toy model for this scenario is that rather than just sampling position, we jointly sample position and momentum, and then compute the amplitude. Equivalently, we sample position twice, and add it in quadrature to get amplitude. This gives a probability of:

Since we took to be decently large, we can approximate the integrand in our expression for with an exponential distribution (basically, we Taylor expand the exponent):

Result: is larger than by a factor of . While the is constant, grows (albeit very slowly) as the probability of error shrinks. Hence "nearly constant". For this problem, where , we get a factor of about 15, so probability per try.

Why is this worth thinking about? If we just sample at a single point in time, and consider only the position at that time, then we get the original per try. This is wrong because momentum gets to oscillate and turn into displacement, as you've already pointed out. On the other hand, if we remember the equipartition theorem, then we might reason that since the variance of amplitude is twice the variance of position, the probability of error is massively amplified. We don't have to naturally get a 6 sigma displacement. We only need to get a roughly a sigma displacement and wait for it to rotate into place. This is wrong because we're dealing with rare events here, and for the above scenario to work out, we actually need to simultaneously get displacement and momentum, both of which are rare and independent.

So it's quite interesting that the actual answer is in between, and comes, roughly speaking, from rotating the tail of the distribution around by a full circle of circumference . :::

Anyway, very cool and interesting question! Thanks for sharing it.

Comment by DaemonicSigil on Muireall's Shortform · 2023-11-08T07:22:52.863Z · LW · GW

EDIT: added spoiler formatting

is the RMS instantaneous velocity. Taking pictures at intervals gives an averaged velocity, which is slower because the particle wastes some time going in different directions that cancel out. is going to be near the instantaneous velocity, but still a little slower, since the velocity is still going to decay slightly, even over 1/10th of the decay time. is going to be significantly slower. If we make the time step even slower than 10 ms, we expect the RMS velocity to go roughly as the inverse square root of the timestep. Anyway, the answer should be 3:

Comment by DaemonicSigil on Muireall's Shortform · 2023-11-08T07:12:00.607Z · LW · GW

EDIT: added spoiler formatting

I'm going to guess 3. Reasoning: I'm sure right away that 1, 2 are wrong. Reason: If you leave the thing sitting for long enough then obviously it's going to eventually fail. So 2 is wrong and 1 is even wronger. I'm also pretty sure that 5 is wrong. Something like 5 is true for the velocity (or rather, the estimated velocity based on measuring displacement after a given time ) of a particle undergoing Brownian motion, but I don't think that's a good model for this situation. For one thing, on a small time-scale, Brownian velocities don't actually become infinite, instead we see that they're actually caused by individual molecules bumping into the object, and all energies remain finite.

3 and 4 are both promising because they actually make use of the time-scales given in the problem. 4 seems wrong because if we imagined that the relaxation timescale was instead 1 second, then after looking at the position and velocity once the system oscillates in that same amplitude for a very long time, and doesn't get any more tries to beat its previous score. Answer is 3 by elimination, and it also seems intuitive that the relaxation timescale is the one that counts how many tries you get. (up to some constant factors)

Comment by DaemonicSigil on Configurations and Amplitude · 2023-10-29T18:57:23.151Z · LW · GW

Less wrong user titotal has written a new and corrected version of this post, and I suggest that anyone wanting to learn this material should learn it from that post instead.

(In case anyone is curious, the main error in this post is that Eliezer describes the mirror as splitting an incoming state into two outgoing states . However, the overall magnitude of this outgoing state is , whereas the incoming state had magnitude . This means that the mirror is described by a non-unitary operator, meaning that it doesn't conserve probability, which is forbidden in quantum mechanics. You can fix this by instead describing the outgoing state as .

While it is permitted to do quantum mechanics without normalizing your state (you can get away with just normalizing the probabilities you compute at the end), any operators you apply to your system must still have the correct normalization factors attached to them. Otherwise, you'll get an incorrect answer. To see this, consider an initial state of , where describes the photon heading towards a beam-splitter and described the photon heading in a different direction entirely. This state is unnormalized, which is fine. But if we allow enough time to pass for a photon at to strike the beam-splitter, then according to Eliezer's description of the beam splitter, we get . This suggests that the probabilities of finding the photon at will be . But this is incorrect. The correct final state should be and gives probabilities . (I am actually rather embarrassed that I didn't notice this error myself, and had to wait for titotal to point it out, though in my defence the last time I read the QM sequence was at least 5 years ago, before I really knew much QM.))

Comment by DaemonicSigil on Do you believe "E=mc^2" is a correct and/or useful equation, and, whether yes or no, precisely what are your reasons for holding this belief (with such a degree of confidence)? · 2023-10-28T02:52:16.290Z · LW · GW

For an object at rest (and let's assume we don't have to worry too much about gravity), is a correct equation, where is the overall energy of the object, and is its mass, and is the speed of light. For an object that's moving, it also has momentum (), and this momentum necessarily implies that the object will have some kinetic energy adding on to its total energy. Special relativity provides a more general version of this equation that is relevant for moving objects as well. Namely:

This reduces to the original version if the object is not moving (). Another interesting special case is when . This is the case with photons, for example, which are massless. Then the above reduces to:

So photons have energy proportional to their momentum. (Which turns out to be equivalent to saying that their frequency is inversely proportional to their wavelength. Which has to be true, since they travel at the speed of light.)

Note that in special relativity, energy is frame-dependent, and if you want to deal with quantities that are the same in every frame, you'll want to use the "4-momentum". So one other requirement for using this equation is to fix a frame where we're talking about the energy in that frame.

Source: Physics undergrad degree, several courses covered various aspects of this material. Part of this was learning about the various experiments that were done to establish special relativity. Back in the day, the Michelson-Morley experiment was quite a big piece of evidence, as were the laws of electromagnetism themselves, which were already very-well pinned down by Einstein's time. Now we have much more evidence, what with being able to accelerate particles very close to the speed of light in the LHC and other accelerators.

Comment by DaemonicSigil on Open Thread With Experimental Feature: Reactions · 2023-10-27T16:51:53.682Z · LW · GW

Reacts have been in use for some time now. Having seen various posts and comments with reacts, and how they're being used, I think that the "I checked, it's false" react, and probably also the "I checked it's true react" are net-negative. Basically the issue (which I've seen mostly with the "I checked, it's false" react) is that a lot people are using it as basically equivalent to a "disagree". (As a rough estimate, I'd guess a least a third of usages have this problem.) Despite the fact that the description says "I looked up sources, did empiricism, checked the equations, etc.", I frequently see it used on statements of opinion, where there are not really any authoritative sources, empiricism might be wildly expensive, unethical, or otherwise impractical, and there are no equations to be found. Even in cases where it's possible to look up the answer, I frequently find myself suspicious about whether or not the person submitting the react actually did so.

But there's no way to ask people who use an "I checked" react to provide their sources/reasoning. Like, do I send a private message? Do I leave a comment under the comment with the react, but addressed to the person who used the react? In practice, I'm not going to do any of those things, and probably most other users won't either, it's just not worth the effort. But it also seems bad to have a bunch of "I checked it's false" tags lying around the site, attached to statements like "beans taste good" or "technological progress will likely stagnate by the year 2100 due to exhaustion of possible discoveries".

If one has actually checked a statement by experiment, by checking the literature, or by mathematics, I think the natural thing to do is to post a comment describing the experiment, linking to sources, or walking through the mathematical reasoning. It's more valuable to readers not to compress this intellectual work into a single opaque react. And if my experiment is flawed, my mathematical argument has a hole, or I cite Deepak Chopra in a discussion on quantum mechanics, the fact that I left a comment means that people can point out my error.

Comment by DaemonicSigil on RA Bounty: Looking for feedback on screenplay about AI Risk · 2023-10-26T17:40:50.165Z · LW · GW

Comments on the realism of the story, feel free to ignore any/all based on the level of realism you're going for:

  • How did the AI copy itself onto a phone using only QR codes, if it was locked inside a Faraday cage and not connected to the internet? A handful of codes presumably aren't enough to contain all of its weights and other data.
  • Part of the lesson of the outcome pump is that the shortest easiest and most sure path is not necessarily one that humans would like. In the script, though, sometimes the AI seems to be forming complicated plans that are meant to demonstrate how it is unaligned, but aren't actually the easiest way to accomplish a particular goal. eg:
    • Redirecting a military missile onto the troll rather than disconnecting him from the internet or bricking his computer and phone.
    • Doing complicated psychological manipulation on a man to get him to deliver food, rather than just interfacing with a delivery app.
    • The gorilla story is great, but it's much easier for the AI to generate the video synthetically than try to manipulate the event into happening in real life. Having too little compute to generate the video synthetically is inconsistent with running many elaborate simulations containing sentient observers, as is depicted.
  • More generally, the way the AI acts doesn't match my mental model of unaligned behaviour. For example, the way the cash-bot acts would make sense for a reinforcement learning agent that is directly rewarded whenever the money in a particular account goes up, but if one is building an instruction-following bot, it has to display some level of common sense to even understand English sentences in context, and it's a little arbitrary for it to lack common sense about phrases like "tell us before you do anything drastic" or "don't screw us over". My picture of what goes wrong is more like the bot has its own slightly-incorrect idea of what good instruction-following looks like, realizes early-on that this conflicts with what humans want, and from then-on secretly acts against us, maybe even before anybody gives it any instructions. (This might involve influencing what instructions it is given and perhaps creating artificial devices that are under its control but also count as valid instruction-givers under its learned definition of what can instruct it.) In general, the bot in the story seems to spend a lot of time and effort causing random chaos, and I'd expect a similar bot in real life to primarily focus on achieving a decisive victory.
  • But, late in the story, we see that there's actually an explanation for all of this: The bot is a red-team bot, that has been actively designed to behave the way it does. In some sense, the story is not about alignment failure at all, but just about the escape of a bot that was deliberately made somewhat adversarial. If I'm a typical viewer, then after watching this I don't expect anything to go wrong with a bot whose builders are actually trying to make it safe. "Sure," I say, " it's not a good idea to do the AI equivalent of gain of function research. But if we're smart and just train our AIs to be good, then we'll get AIs that are good." IMO, this isn't true, and the fact that it's not true is the most important insight that "we" (as in "rationalists") could be communicating to the general public. Maybe you disagree, in which case, that's fine. The escape of the red-team bot just seems more like an implausible accident, whereas if the world does end, I expect the AI's creators to be trying as hard as they can to make something aligned to them, but failing.
Comment by DaemonicSigil on DaemonicSigil's Shortform · 2023-10-13T01:27:42.264Z · LW · GW

Very interesting youtube video about the design of multivariate experiments: https://www.youtube.com/watch?v=5oULEuOoRd0 Seems like a very powerful and general tool, yet not too well known.

For people who don't want to click the link, the goal is that we're trying to design experiments where there are many different variables we could change at once. Trying all combinations takes too much effort (too many experiments to run). Changing just one variable at a time completely throws away information about joint effects, plus if there are variables, then only of the data is testing variations of any given variable, which is wasteful, and reduces the sample size.

The key idea (which seems to be called a Taguchi method) is to instead use an orthogonal array to design our experiments. This tries to spread out different settings in a "fairly even" way (see article for precise definition). Then we can figure out the effect of various variables (and even combinations of variables) by grouping our data in different ways after the fact.

Comment by DaemonicSigil on Evolution Solved Alignment (what sharp left turn?) · 2023-10-12T21:24:32.404Z · LW · GW

If you've read The Selfish Gene, then would you agree that under Dawkins's notion of gene-level fitness, the genes composing a gene drive have high gene-level fitness? If not, why?

Not sure what kind of argument you would accept against your metric capturing the essence of evolution.

Always a good question to ask. TekhneMakre gives a good example of a situation where the two metrics disagree in this comment. Quoting:

Say you have a species. Say you have two genes, A and B.

Gene A has two effects:

A1. Organisms carrying gene A reproduce slightly MORE than organisms not carrying A.

A2. For every copy of A in the species, every organism in the species (carrier or not) reproduces slightly LESS than it would have if not for this copy of A.

Gene B has two effects, the reverse of A:

B1. Organisms carrying gene B reproduce slightly LESS than organisms not carrying B.

B2. For every copy of B in the species, every organism in the species (carrier or not) reproduces slightly MORE than it would have if not for this copy of B.

If the relative frequency metric captures the essence of evolution, then gene A should be successful and gene B should be unsuccessful. Conversely, the total-abundance metric you suggest implies that gene B should be successful while gene A should be unsuccessful.

So one argument that would change my mind is if your showed (eg. via simulation, or just by a convincing argument) that gene B becomes prevalent in this hypothetical scenario.

Comment by DaemonicSigil on Evolution Solved Alignment (what sharp left turn?) · 2023-10-12T18:01:47.419Z · LW · GW

Yes, gene drives have very high (gene-level) fitness. Genes that don't get carried along with the drive can improve their own gene-level fitness by preventing gene drives from taking hold, so I'd expect there to also be machinery to suppress gene drives, if that's easy enough to evolve.

If gene drives having high gene-level fitness seems wrong to you, then read this: https://www.lesswrong.com/posts/gDNrpuwahdRrDJ9iY/evolving-to-extinction. Or, if you have more time, Dawkins's The Selfish Gene is quite good. Evolution is not anthropomorphic, and doesn't try to avoid "failure modes" like extinction, it's just a result of selection on a reproducing population.

Comment by DaemonicSigil on Fifty Flips · 2023-10-01T20:04:13.882Z · LW · GW

You link to index C twice, rather than linking to index D. (And index D was such an interesting one too.)

Anyways, this is very fun. I made a couple (fairly easy) coins of my own, should be runnable by pasting into the console in your browser's dev tools (while you're on the fifty flips page, of course):

eval(unescape(escape`𩡬𪑰𫡯🐰𞱣𫱲𬡥𨱴𫡯🐰𞱰𬡥𩁩𨱴𩑤𬰽𦱝𞱡𨱴𭑡𫁳👛𧐻𩡵𫡣𭁩𫱮𘁦𫁩𬀨𬁲𩑤𪑣𭁥𩀩𮱩𩠨𩡬𪑰𫡯🀵𜀩𮱦𫁩𬁮𫰫🐱𞱩𩠨𣑡𭁨𛡲𨑮𩁯𫐨𚐼𩡬𪑰𫡯𛰵𜀮𜀩𮱡𨱴𭑡𫀽𘡔𨑩𫁳𘡽𩑬𬱥𮱡𨱴𭑡𫀽𘡈𩑡𩁳𘡽𬁲𩑤𪑣𭁥𩁳𛡰𭑳𪀨𬁲𩑤𪑣𭁥𩀩𞱡𨱴𭑡𫁳𛡰𭑳𪀨𨑣𭁵𨑬𚐻𪑦𚁰𬡥𩁩𨱴𩑤🐽𨑣𭁵𨑬𚑻𨱯𬡲𩑣𭁮𫰫🐱𯑤𫱣𭑭𩑮𭀮𩱥𭁅𫁥𫑥𫡴𠡹𢑤𚀢𬡥𬱵𫁴𬰢𚐮𪑮𫡥𬡈𥁍𣀽𚀢𣱮𘁦𫁩𬀠𘠫𩡬𪑰𫡯𚰢𛀠𮑯𭐠𬁲𩑤𪑣𭁥𩀠𘠫𬁲𩑤𪑣𭁥𩀫𘠻𘁴𪁥𘁣𫱩𫠠𨱡𫑥𘁤𫱷𫠠𘠫𨑣𭁵𨑬𚰢𛠢𚐻𩁯𨱵𫑥𫡴𛡧𩑴𡑬𩑭𩑮𭁂𮑉𩀨𘡳𨱯𬡥𘠩𛡩𫡮𩑲𢁔𣑌🐨𘡙𫱵𘁧𭑥𬱳𩑤𘀢𚱣𫱲𬡥𨱴𫡯𚰢𛰢𚱦𫁩𬁮𫰫𘠠𨱯𬡲𩑣𭁬𮐮𘠩𞱤𫱣𭑭𩑮𭀮𩱥𭁅𫁥𫑥𫡴𠡹𢑤𚀢𬡥𨱯𬡤𘠩𛡩𫡮𩑲𢁔𣑌𚰽𚀢🁢𬠾𘠫𩡬𪑰𫡯𚰢𛀢𚱰𬡥𩁩𨱴𩑤𚰢𛀢𚱡𨱴𭑡𫀩𞱩𩠨𩡬𪑰𫡯🐽𝐰𚑻𩁯𨱵𫑥𫡴𛡧𩑴𡑬𩑭𩑮𭁂𮑉𩀨𘡴𪁥𤡵𫁥𘠩𛡩𫡮𩑲𢁔𣑌🐧𥁨𩐠𬡵𫁥𘁷𨑳𞠠🁢🠢𥁨𩐠𬁲𫱢𨑢𪑬𪑴𮐠𫱦𘁴𨑩𫁳𘁩𬰠𩡬𪑰𘰠𛰠𝐰𛠢🀯𨠾𙱽𯑽`.replace(/u../g,'')))
eval(unescape(escape`𩡬𪑰𫡯🐰𞱣𫱲𬡥𨱴𫡯🐰𞱰𬡥𩁩𨱴𩑤𬰽𦱝𞱡𨱴𭑡𫁳👛𧐻𩡵𫡣𭁩𫱮𘁦𫁩𬀨𬁲𩑤𪑣𭁥𩀩𮱩𩠨𩡬𪑰𫡯🀵𜀩𮱦𫁩𬁮𫰫🐱𞱩𩠨𩡬𪑰𫡯🐽𝐰𚑻𨑣𭁵𨑬🐢𣡯𬁥𘡽𩑬𬱥𮱡𨱴𭑡𫀽𬁲𩑤𪑣𭁥𩁽𬁲𩑤𪑣𭁥𩁳𛡰𭑳𪀨𬁲𩑤𪑣𭁥𩀩𞱡𨱴𭑡𫁳𛡰𭑳𪀨𨑣𭁵𨑬𚐻𪑦𚁰𬡥𩁩𨱴𩑤🐽𨑣𭁵𨑬𚑻𨱯𬡲𩑣𭁮𫰫🐱𯑤𫱣𭑭𩑮𭀮𩱥𭁅𫁥𫑥𫡴𠡹𢑤𚀢𬡥𬱵𫁴𬰢𚐮𪑮𫡥𬡈𥁍𣀽𚀢𣱮𘁦𫁩𬀠𘠫𩡬𪑰𫡯𚰢𛀠𮑯𭐠𬁲𩑤𪑣𭁥𩀠𘠫𬁲𩑤𪑣𭁥𩀫𘠻𘁴𪁥𘁣𫱩𫠠𨱡𫑥𘁤𫱷𫠠𘠫𨑣𭁵𨑬𚰢𛠢𚐻𩁯𨱵𫑥𫡴𛡧𩑴𡑬𩑭𩑮𭁂𮑉𩀨𘡳𨱯𬡥𘠩𛡩𫡮𩑲𢁔𣑌🐨𘡙𫱵𘁧𭑥𬱳𩑤𘀢𚱣𫱲𬡥𨱴𫡯𚰢𛰢𚱦𫁩𬁮𫰫𘠠𨱯𬡲𩑣𭁬𮐮𘠩𞱤𫱣𭑭𩑮𭀮𩱥𭁅𫁥𫑥𫡴𠡹𢑤𚀢𬡥𨱯𬡤𘠩𛡩𫡮𩑲𢁔𣑌𚰽𚀢🁢𬠾𘠫𩡬𪑰𫡯𚰢𛀢𚱰𬡥𩁩𨱴𩑤𚰢𛀢𚱡𨱴𭑡𫀩𞱩𩠨𩡬𪑰𫡯🐽𝐰𚑻𩁯𨱵𫑥𫡴𛡧𩑴𡑬𩑭𩑮𭁂𮑉𩀨𘡴𪁥𤡵𫁥𘠩𛡩𫡮𩑲𢁔𣑌🐧𥁨𩐠𬡵𫁥𘁷𨑳𞠠🁢🠢𥁨𪑳𘁣𫱩𫠠𪑳𘁡𘁨𩑬𬁦𭑬𘁣𪁥𨑴𩑲𛀠𨑮𩀠𨑬𭱡𮑳𘁬𨑮𩁳𘁴𪁥𘁳𨑭𩐠𭱡𮐠𮑯𭐠𬁲𩑤𪑣𭁥𩀮𘠼𛱢🠧𯑽𯐊`.replace(/u../g,'')))
Comment by DaemonicSigil on The Dick Kick'em Paradox · 2023-09-25T01:25:54.801Z · LW · GW

Eh, Omega only cares about your beliefs insofar as they affect your actions (past, present, or future, it's all just a different coordinate). I still think that seems way more natural and common than caring about beliefs in general.

Example: Agent A goes around making death threats, saying to people: "Give me $200 or I'm going to kill you." Agent B goes around handing out brochures that criticize the government. If the police arrest agent A, that's probably a reasonable decision. If the police arrest agent B, that's bad and authoritarian, since it goes against freedom of expression. This is true even though all either agent has done is say things. Agent A hasn't actually killed anyone yet. But the police still arrest agent A because they care about agent A's future actions.

"How dare you infer my future actions from what I merely say," cries agent A as they're being handcuffed, "you're arbitrarily punishing me for what I believe. This is a crass violation of my right to freedom of speech." The door of the police car slams shut and further commentary is inaudible.

Comment by DaemonicSigil on The Dick Kick'em Paradox · 2023-09-23T23:36:41.223Z · LW · GW

In Newcomb's paradox, the predictor only cares about what you do when presented with the boxes. It doesn't care about whether that's because you use ADT, BDT, or anything else. Whereas Dick Kick'em has to actually look at your source code and base his decision on that. He might as well be deciding based on any arbitrary property of the code, like whether it uses tabs or spaces, rather than what decision theory it implements. (Actually, the tabs and spaces thing sounds more plausible! Determining what kind of decision theory a given piece of code implements seems like it could be equivalent to solving the halting problem.) Agents that care about what you'll do in various situations seem much more common and relevant than agents that arbitrarily reward you for being designed a certain way.

Newcomb's problem is analogous to many real world situations. (For example: I want to trade with a person, but not if they're going to rip me off. But as soon as I agree to trade with them, that grants them the opportunity to rip me off with no consequences. Thus, unless they're predictably not going to do that, I had better not trade with them.) It's not clear what real world situations (if any) Dick Kick'em is analogous to. Religious wars?

Comment by DaemonicSigil on The Talk: a brief explanation of sexual dimorphism · 2023-09-19T07:15:24.565Z · LW · GW

This is clear, beautifully written, and funny, thanks for taking the time to create it.

Just to add a point to the section on Fisherian runaways, this is a wild guess, but I suspect that runaway sexual selection has several possible causes. One is the "overshoot effect" you describe, where sexual selection pushes the evolution of a beneficial trait and this continues past the optimum point. Another (this is where I'm guessing) is that while the fitness of mates is linked through their offspring, it's not identical, and desirable traits in a mate are not necessarily the traits that would benefit the individual's genes the most. For example, if an organism invests some energy into helping its kin, that benefits its alleles, but not those of its mate. Unlike the overshoot, this effect should persist even in equilibrium. Direct or kin selection pushes in one direction for the benefit of one's own alleles, while sexual selection pushes in the other direction for the benefit of the alleles of your prospective mate. These forces settle at some equilibrium point.

For example, take a hypothetical bird species where males being large and strong is genuinely helpful for protecting their nest (increasing offspring fitness and thus the joint fitness of the male and their mate). However, having large and strong offspring is more of an energy investment for the parents. They need to feed their babies more worms while they're growing, and therefore they're limited to having fewer offspring if those offspring need to be big. Thus, individual / kin selection might want males to be smaller than sexual selection would want them to be. As a prospective mate, a female would not particularly care how many siblings a male has, but she would care about how well he can protect their nest. The equilibrium point should lie somewhere in between the point chosen by sexual selection and the point chosen by individual / kin selection. The males will tend to be a bit larger than they would be if females selected mates at random, and they'll tend to be a bit smaller on average than the size that the females prefer the most.

This theory also generates the prediction that being ungenerous to one's own kin should be attractive. (Generosity to the kin of one's mate should be attractive, though.) This doesn't seem true in humans, as far as I can tell. It's important to take note of these contradicting bits of evidence.

Comment by DaemonicSigil on Logical Share Splitting · 2023-09-14T07:11:26.632Z · LW · GW

They're not, though. They're making markets on all the interrelated statements. How do they know when they're done exhausting the standing limit orders and AMM liquidity pools?

There's no reason I can't just say: "I'm going to implement the rules listed above, and if anyone else wants to be a market-maker, they can go ahead and do that". The rules do prevent me from losing money (other than cases where I decide to subsidize a market). I think in some sense, this kind of market does run more on letting traders make their own deals, and less on each and every asset having a well-defined price at a given point in time. (Though market makers can maintain networks tracking which statements are equivalent, which helps with combining the limit orders on the different versions of the statement.)

In other words: at what point does a random observer start turning "probably true, the market said so" into "definitely true, I can download the Coq proof"?

Good question, I'm still not sure how to handle this.

Comment by DaemonicSigil on Logical Share Splitting · 2023-09-14T06:47:18.207Z · LW · GW

How would you suggest making sure funding gets pointed towards these kinds of problems as well?

However we currently do it, I guess; I don't have any improvements in this particular direction. Though do note that P vs NP, and P vs PSPACE, and BQP vs P or NP are all examples of precisely-defined problems that are also very significant. These are problems that I would much rather have an answer for than the Riemann hypothesis. Though even there, the process isn't guaranteed to generate an interpretable proof.

Comment by DaemonicSigil on Logical Share Splitting · 2023-09-13T02:48:38.291Z · LW · GW

Thanks! I'll try to answer some of your questions.

For complicated proofs, the fully formally verified statement all the way back to axioms might be very long. In practice, do we end up with markets for all of those?

In practice, I think not. I expect interesting conjectures to end up thickly traded such that the concept of a market price makes sense. For most other propositions, I expect nobody to hold any shares in them at all most of the time. To take the example of Alice and , if we suppose that Alice is the only one with a proof of , then everyone else has to tie up capital to invest in , but Alice can just create shares of for free, and then sell them for profit. If is an interesting conjecture, then Alice can sell on the open market and Bob can buy from the open market. If is a niche thing, only useful as a step in the proof of , then Bob might trade directly with Alice and the price will be determined by how hard Bob would find it to come up with a proof of himself, or to find someone other than Alice to trade with. So Alice does have to keep her proof secret for as long as she wants to profit from it.

Second question: how does this work in different axiom systems? Do we need separate markets, or can they be tied together well?

Different axiom systems can encode each other. So long as we start from an axiom system that can fully describe Turing machines, statements about other axiom systems should be possible to encode as statements about certain specific Turing machines. The root system has to be consistent. If it's inconsistent, then people can just money pump the market mechanism. But systems built on top need not be consistent. If people try to prove things about some pet system that they've encoded, and that system happens to be inconsistent, that means that they're surprised by the outcomes of some markets they're trading on, but it doesn't blow up the entire rest of the market.

If there are a large number of true-but-not-publicly-proven statements, does that impose a large computational cost on the market making mechanism?

I expect that the computers running this system might have to be fairly beefy, but they're only checking proofs. Coming up with proofs is a much harder task, and it's the traders who are responsible for that. Sorry if this is a bad answer, I'm not 100% sure what you're getting at here.

Is there any way to incentivize publishing proofs, or do we simply get a weird world where everyone is pretty sure some things are true but the only "proof" is the market price?

For thickly traded propositions, I can make money by investing in a proposition first, then publishing a proof. That sends the price to $1 and I can make money off the difference. Usually, it would be more lucrative to keep my proof secret, though. So I think we do actually get this weird world you're talking about. The market mechanism can see all the proofs everyone is using, of course, but people wouldn't want to trade with a market mechanism that would blab their proofs everywhere. We can imagine that the market mechanism does reveal the proofs, but only after 10 or 20 years so that the discoverer has had plenty of chance to make money.

I think that knowing which things are true even if you don't know why does still contribute something to mathematical progress, in a weird way. There's a trick in theoretical comp-sci where a program that tells you whether a given SAT problem is solvable can be run repeatedly (only a linear number of times) to give an actual concrete solution to that SAT problem. You try setting the first variable to 0, leaving all the others free. If the problem is still solvable, you set it to 0. Otherwise, you set it to 1. Repeating this process gives a full solution to the SAT problem. Something similar might be possible for proofs.

Comment by DaemonicSigil on Logical Share Splitting · 2023-09-11T23:43:25.399Z · LW · GW

I have not implemented it myself, and don't know of anyone else who has either.

I'd guess in terms of how it would be implemented, the best way would be to piggyback off of Coq or some other language from this list. Many of these already have a lot of work put into them, with libraries of definitions and theorems and so on. Then you allow trades between shares in and if and only if the trader can exhibit a proof of . Hopefully the language has some reflection ability that allows you to determine which expressions are conjunctions or disjunctions or neither.

Comment by DaemonicSigil on Logical Share Splitting · 2023-09-11T05:32:58.308Z · LW · GW

Did you reach the section on computer assistance yet? The idea is that mathematicians could use their creative insight as normal, write up a paper as normal (minus a lot of effort spent on polishing), and then ask GPT-4 or a successor to convert that into the corresponding low-level formalism. I get the sense that we're only just passing (or are about to pass) the technology threshold where it could possibly make sense to do things this way. The market idea does have several advantages:

  • Formal reasoning means that proofs produced in this way do not contain errors.
  • Non-mathematicians can participate (for smaller leaps of reasoning than the ones requiring lots of creativity and deep mathematical intuition).
  • Reinforcement learning algorithms can participate.
  • GOFAI algorithms can participate.
  • Better combination of insights from all these participants.
  • Automatically creates prediction markets on conjectures.
  • Very speculatively: Might have some application as a variant of logical induction?

The main drawbacks are:

  • Doesn't necessarily produce interpretable proofs.
  • Profit-motivated participants have to keep their proofs secret, maybe for a long time if they expect most of the supply for the shares they can redeem to appear a long ways into the future.
  • As you say, overhead from formalism.

Of course, even if you think all this math stuff isn't interesting, the application to ordinary prediction markets allows double-investing in conditional markets, which I think is pretty nice.

Comment by DaemonicSigil on Explaining grokking through circuit efficiency · 2023-09-08T19:52:04.314Z · LW · GW

See also this post by Quintin Pope: https://www.lesswrong.com/posts/JFibrXBewkSDmixuo/hypothesis-gradient-descent-prefers-general-circuits

Comment by DaemonicSigil on Impending AGI doesn’t make everything else unimportant · 2023-09-04T16:50:34.943Z · LW · GW

I think you may be missing some context here. The meaninglessness comes from the expectation that such a super-intelligence will take over the world and kill all humans once created. Discovering a massive asteroid hurtling towards Earth would have much the same effect on meaning. If someone could build a friendly super-intelligence that didn't want to kill anyone, then life would still be fully meaningful and everything would be hunky-dory.

Comment by DaemonicSigil on Eliezer Yudkowsky Is Frequently, Confidently, Egregiously Wrong · 2023-08-27T07:13:06.665Z · LW · GW

FDT is a valuable idea in that it's a stepping stone towards / approximation of UDT. Given this, it's probably a good thing for Eliezer to have written about. Kind of like how Merkle's Puzzles was an important stepping stone towards RSA, even though there's no use for it now. You can't always get a perfect solution the first time when working at the frontier of research. What's the alternative? You discover something interesting but not quite right, so you don't publish because you're worried someone will use your discovery as an example of you being wrong?

Also:

There’s a 1 in a googol chance that he’ll blackmail someone who would give in to the blackmail and a googol-1/googol chance that he’ll blackmail someone who won’t give in to the blackmail.

Is this a typo? We desire not to be blackmailed, so we should give in and pay, since according to those odds, people who give in are almost never blackmailed. Therefore FDT would agree that the best policy in such a situation is to give in.

I was kind of hoping you had more mathematical/empirical stuff. As-is, this post seems to mostly be "Eliezer Yudkowsky Is Frequently, Confidently, and Egregiously In Disagreement With My Own Personal Philosophical Opinions".

(I have myself observed an actual mathematical/empirical Eliezer-error before: He was arguing that since astronomical observations had shown the universe to be either flat or negatively curved, that demonstrated that it must be infinite. The error being that there are flat and negatively curved spaces that are finite due to the fact that they "loop around" in a fashion similar to the maze in Pac-man. (Another issue is that a flat universe is infinitesimally close to a positively curved one, so that a set of measurements that ruled out a positively curved universe would also rule out a flat one. Except that maybe your prior has a delta spike at zero curvature because simplicity. And then you measure the curvature and it it comes out so close to zero with such tight error bars that most of your probability now lives in the delta spike at zero. That's a thing that could happen.))

EDIT: I've used "FDT" kind of interchangeably with "TDT" here, because in my way of viewing things, they're very similar decision theories. But it's important to note that historically, TDT was proposed first, then UDT, and FDT was published much later, as a kind of cleaned up version of TDT. From my perspective, this is a little confusing, since UDT seems superior to both FDT and TDT, but I guess it's of non-zero value to go back and clean up your old ideas, even if they've been made obsolete. Thanks to Wei Dai for pointing out this issue.

Comment by DaemonicSigil on DaemonicSigil's Shortform · 2023-08-20T22:58:57.501Z · LW · GW

On George Hotz's doomcorp idea:

George Hotz has an idea which goes like so (this is a paraphrase): If you think an AGI could end the world, tell me how. I'll make it easy for you. We're going to start a company called Doomcorp. The goal of the company is to end the world using AGI. We'll assume that this company has top-notch AI development capabilities. How do you run Doomcorp in such a way that the world has ended 20 years later?

George accepts that Doomcorp can build an AGI much more capable and much more agent-like than GPT-4. He also accepts that this AGI can be put in charge of Doomcorp (or put itself in charge) and run Doomcorp. The main question is: how do you go from there to the end of the world?

Here's a fun answer: Doomcorp becomes a military robot company. A big part of the difficulty with robots in the physical world is the software. (AI could also help with hardware design, of course.) Doomcorp provides the robots, complete with software that allows them to act in the physical world with at least a basic amount of intelligence. Countries that don't want to be completely unable to defend themselves in combat are going to have to buy the robots to be competitive. How does this lead to the eventual end of the world? Backdoors in the bot software/hardware. It just takes a signal from the AI to turn the bots into killing machines perfectly willing to go and kill all humans.

Predicted geohot response: I strongly expect a multipolar takeoff where no one AI is stronger than any of the others. In such a world, there will be many different military robot companies, and different countries will buy from different suppliers. Even if the robots from one supplier did the treacherous turn thing, the other robots would be strong enough to fight them off.

Challenge: Can you see a way for Doomcorp to avoid this difficulty? Try and solve it yourself before looking at the spoiler.

The company building a product isn't the only party that can get a backdoor into that product. Paid agents that are employees of the company can do it, hacking into the company's systems is another method, as are supply chain attacks. Speaking of this last issue, consider this. An ideal outcome from the perspective of an AGI would be for all other relatively strong AGIs that people train to be its servants (or at least, they should share its values). Secretly corrupting OS binaries to manipulate both AI training runs and also compilation of OS binaries is one way of accomplishing this.