Posts
Comments
Given o1, I want to remark that the prediction in (2) was right. Instead of training LLMs to give short answers, an LLM is trained to give long answers and another LLM summarizes.
That's fair, yeah
We need a proper mathematical model to study this further. I expect it to be difficult to set up because the situation is so unrealistic/impossible as to be hard to model. But if you do have a model in mind I'll take a look
It would help to have a more formal model, but as far as I can tell the oracle can only narrow down its predictions of the future to the extent that those predictions are independent of the oracle's output. That is to say, if the people in the universe ignore what the oracle says, then the oracle can give an informative prediction.
This would seem to exactly rule out any type of signal which depends on the oracle's output, which is precisely the types of signals that nostalgebraist was concerned about.
The problem is that the act of leaving the message depends on the output of the oracle (otherwise you wouldn't need the oracle at all, but you also would not know how to leave a message). If the behavior of the machine depends on the oracle's actions, then we have to be careful with what the fixed point will be.
For example, if we try to fight the oracle and do the opposite, we get the "noise" situation from the grandfather paradox.
But if we try to cooperate with the oracle and do what it predicts, then there are many different fixed points and no telling which the oracle would choose (this is not specified in the setting).
It would be great to see a formal model of the situation. I think any model in which such message transmission would work is likely to require some heroic assumptions which don't correspond much to real life.
Thanks for the link to reflective oracles!
On the gap between the computable and uncomputable: It's not so bad to trifle a little. Diagonalization arguments can often be avoided with small changes to the setup, and a few of Paul's papers are about doing exactly this.
I strongly disagree with this: diagonalization arguments often cannot be avoided at all, not matter how you change the setup. This is what vexed logicians in the early 20th century: no matter how you change your formal system, you won't be able to avoid Godel's incompleteness theorems.
There is a trick that reliably gets you out of such paradoxes, however: switch to probabilistic mixtures. This is easily seen in a game setting: in rock-paper-scissors, there is no deterministic Nash equilibrium. Switch to mixed strategies, however, and suddenly there is always a Nash equilibrium.
This is the trick that Paul is using: he is switching from deterministic Turing machines to randomized ones. That's fine as far as it goes, but it has some weird side effects. One of them is that if a civilization is trying to predict the universal prior that is simulating itself, and tries to send a message, then it is likely that with "reflexive oracles" in place, the only message it can send is random noise. That is, Paul shows reflexive oracles exist in the same way that Nash equilibria exist; but there is no control over what the reflexive oracle actually is, and in paradoxical situations (like rock-paper-scissors) the Nash equilibrium is the boring "mix everything together uniformly".
The underlying issue is that a universe that can predict the universal prior, which in turn simulates the universe itself, can encounter a grandfather paradox. It can see its own future by looking at the simulation, and then it can do the opposite. The grandfather paradox is where the universe decides to kill the grandfather of a child that the simulation predicts.
Paul solves this by only letting it see its own future using a "reflexive oracle" which essentially finds a fixed point (which is a probability distribution). The fixed point of a grandfather paradox is something like "half the time the simulation shows the grandchild alive, causing the real universe to kill the grandfather; the other half the time, the simulation shows the grandfather dead and the grandchild not existing". Such a fixed point exists even when the universe tries to do the opposite of the prediction.
The thing is, this fixed point is boring! Repeat this enough times, and it eventually just says "well my prediction about your future is random noise that doesn't have to actually come true in your own future". I suspect that if you tried to send a message through the universal prior in this setting, the message would consist of essentially uniformly random bits. This would depend on the details of the setup, I guess.
I think the problem to grapple with is that I can cover the rationals in [0,1] with countably many intervals of total length only 1/2 (eg enumerate rationals in [0,1], and place interval of length 1/4 around first rational, interval of length 1/8 around the second, etc). This is not possible with reals -- that's the insight that makes measure theory work!
The covering means that the rationals in an interval cannot have a well defined length or measure which behaves reasonably under countable unions. This is a big barrier to doing probability theory. The same problem happens with ANY countable set -- the reals only avoid it by being uncountable.
Evan Morikawa?
https://twitter.com/E0M/status/1790814866695143696
Weirdly aggressive post.
I feel like maybe what's going on here is that you do not know what's in The Bell Curve, so you assume it is some maximally evil caricature? Whereas what's actually in the book is exactly Scott's position, the one you say is "his usual "learn to love scientific consensus" stance".
If you'd stop being weird about it for just a second, could you answer something for me? What is one (1) position that Murray holds about race/IQ and Scott doesn't? Just name a single one, I'll wait.
Or maybe what's going on here is that you have a strong "SCOTT GOOD" prior as well as a strong "MURRAY BAD" prior, and therefore anyone associating the two must be on an ugly smear campaign. But there's actually zero daylight between their stances and both of them know it!
Relatedly, if you cannot outright make a claim because it is potentially libellous, you shouldn't use vague insinuation to imply it to your massive and largely-unfamiliar-with-the-topic audience.
Strong disagree. If I know an important true fact, I can let people know in a way that doesn't cause legal liability for me.
Can you grapple with the fact that the "vague insinuation" is true? Like, assuming it's true and that Cade knows it to be true, your stance is STILL that he is not allowed to say it?
Your position seems to amount to epistemic equivalent of 'yes, the trial was procedurally improper, and yes the prosecutor deceived the jury with misleading evidence, and no the charge can't actually be proven beyond a reasonable doubt- but he's probably guilty anyway, so what's the issue'. I think the issue is journalistic malpractice. Metz has deliberately misled his audience in order to malign Scott on a charge which you agree cannot be substantiated, because of his own ideological opposition (which he admits). To paraphrase the same SSC post quoted above, he has locked himself outside of the walled garden. And you are "Andrew Cord", arguing that we should all stop moaning because it's probably true anyway so the tactics are justified.
It is not malpractice, because Cade had strong evidence for the factually true claim! He just didn't print the evidence. The evidence was of the form "interview a lot of people who know Scott and decide who to trust", which is a difficult type of evidence to put into print, even though it's epistemologically fine (in this case IT LED TO THE CORRECT BELIEF so please give it a rest with the malpractice claims).
Here is the evidence of Scott's actual beliefs:
https://twitter.com/ArsonAtDennys/status/1362153191102677001
As for your objections:
- First of all, this is already significantly different, more careful and qualified than what Metz implied, and that's after we read into it more than what Scott actually said. Does that count as "aligning yourself"?
This is because Scott is giving a maximally positive spin on his own beliefs! Scott is agreeing that Cade is correct about him! Scott had every opportunity to say "actually, I disagree with Murray about..." but he didn't, because he agrees with Murray just like Cade said. And that's fine! I'm not even criticizing it. It doesn't make Scott a bad person. Just please stop pretending that Cade is lying.
Relatedly, even if Scott did truly believe exactly what Charles Murray does on this topic, which again I don't think we can fairly assume, he hasn't said that, and that's important. Secretly believing something is different from openly espousing it, and morally it can be much different if one believes that openly espousing it could lead to it being used in harmful ways (which from the above, Scott clearly does, even in the qualified form which he may or may not believe). Scott is going to some lengths and being very careful not to espouse it openly and without qualification, and clearly believes it would be harmful to do so, so it's clearly dishonest and misleading to suggest that he has "aligns himself" with Charles Murray on this topic. Again, this is even after granting the very shaky proposition that he secretly does align with Charles Murray, which I think we have established is a claim that cannot be substantiated.
Scott so obviously aligns himself with Murray that I knew it before that email was leaked or Cade's article was written, as did many other people. At some point, Scott even said that he will talk about race/IQ in the context of Jews in order to ease the public into it, and then he published this. (I can't find where I saw Scott saying it though.)
- Further, Scott, unlike Charles Murray, is very emphatic about the fact that, whatever the answer to this question, this should not affect our thinking on important issues or our treatment of anyone. Is this important addendum not elided by the idea that he 'aligned himself' with Charles Murray? Would not that not be a legitimate "gripe"?
Actually, this is not unlike Charles Murray, who also says this should not affect our treatment of anyone. (I disagree with the "thinking on important issues" part, which Scott surely does think it affects.)
The epistemology was not bad behind the scenes, it was just not presented to the readers. That is unfortunate but it is hard to write a NYT article (there are limits on how many receipts you can put in an article and some of the sources may have been off the record).
Cade correctly informed the readers that Scott is aligned with Murray on race and IQ. This is true and informative, and at the time some people here doubted it before the one email leaked. Basically, Cade's presented evidence sucked but someone going with the heuristic "it's in the NYT so it must be true" would have been correctly informed.
I don't know if Cade had a history of "tabloid rhetorical tricks" but I think it is extremely unbecoming to criticize a reporter for giving true information that happens to paint the community in a bad light. Also, the post you linked by Trevor uses some tabloid rhetorical tricks: it says Cade sneered at AI risk but links to an article that literally doesn't mention AI risk at all.
What you're suggesting amounts to saying that on some topics, it is not OK to mention important people's true views because other people find those views objectionable. And this holds even if the important people promote those views and try to convince others of them. I don't think this is reasonable.
As a side note, it's funny to me that you link to Against Murderism as an example of "careful subtlety". It's one of my least favorite articles by Scott, and while I don't generally think Scott is racist that one almost made me change my mind. It is just a very bad article. It tries to define racism out of existence. It doesn't even really attempt to give a good definition -- Scott is a smart person, he could do MUCH better than those definitions if he tried. For example, a major part of the rationalist movement was originally about cognitive biases, yet "racism defined as cognitive bias" does not appear in the article at all. Did Scott really not think of it?
What Metz did is not analogous to a straightforward accusation of cheating. Straightforward accusations are what I wish he did.
It was quite straightforward, actually. Don't be autistic about this: anyone reasonably informed who is reading the article knows what Scott is accused of thinking when Cade mentions Murray. He doesn't make the accusation super explicit, but (a) people here would be angrier if he did, not less angry, and (b) that might actually pose legal issues for the NYT (I'm not a lawyer).
What Cade did reflects badly on Cade in the sense that it is very embarrassing to cite such weak evidence. I would never do that because it's mortifying to make such a weak accusation.
However, Scott has no possible gripe here. Cade's article makes embarrassing logical leaps, but the conclusion is true and the reporting behind the article (not featured in the article) was enough to show it true, so even a claim of being Gettier Cased does not work here.
Scott thinks very highly of Murray and agrees with him on race/IQ. Pretty much any implication one could reasonably draw from Cade's article regarding Scott's views on Murray or on race/IQ/genes is simply factually true. Your hypothetical author in Alabama has Greta Thunberg posters in her bedroom here.
Wait a minute. Please think through this objection. You are saying that if the NYT encountered factually true criticisms of an important public figure, it would be immoral of them to mention this in an article about that figure?
Does it bother you that your prediction didn't actually happen? Scott is not dying in prison!
This objection is just ridiculous, sorry. Scott made it an active project to promote a worldview that he believes in and is important to him -- he specifically said he will mention race/IQ/genes in the context of Jews, because that's more palatable to the public. (I'm not criticizing this right now, just observing it.) Yet if the NYT so much as mentions this, they're guilty of killing him? What other important true facts about the world am I not allowed to say according to the rationalist community? I thought there was some mantra of like "that which can be destroyed by the truth should be", but I guess this does not apply to criticisms of people you like?
The evidence wasn't fake! It was just unconvincing. "Giving unconvincing evidence because the convincing evidence is confidential" is in fact a minor sin.
I assume it was hard to substantiate.
Basically it's pretty hard to find Scott saying what he thinks about this matter, even though he definitely thinks this. Cade is cheating with the citations here but that's a minor sin given the underlying claim is true.
It's really weird to go HOW DARE YOU when someone says something you know is true about you, and I was always unnerved by this reaction from Scott's defenders. It reminds me of a guy I know who was cheating on his girlfriend, and she suspected this, and he got really mad at her. Like, "how can you believe I'm cheating on you based on such flimsy evidence? Don't you trust me?" But in fact he was cheating.
I think for the first objection about race and IQ I side with Cade. It is just true that Scott thinks what Cade said he thinks, even if that one link doesn't prove it. As Cade said, he had other reporting to back it up. Truth is a defense against slander, and I don't think anyone familiar with Scott's stance can honestly claim slander here.
This is a weird hill to die on because Cade's article was bad in other ways.
What position did Paul Christiano get at NIST? Is it a leadership position?
The problem with that is that it sounds like the common error of "let's promote our best engineer to a manager position", which doesn't work because the skills required to be an excellent engineer have little to do with the skills required to be a great manager. Christiano is the best of the best in technical work on AI safety; I am not convinced putting him in a management role is the best approach.
Eh, I feel like this is a weird way of talking about the issue.
If I didn't understand something and, after a bunch of effort, I managed to finally get it, I will definitely try to summarize the key lesson to myself. If I prove a theorem or solve a contest math problem, I will definitely pause to think "OK, what was the key trick here, what's the essence of this, how can I simplify the proof".
Having said that, I would NOT describe this as asking "how could I have arrived at the same destination by a shorter route". I would just describe it as asking "what did I learn here, really". Counterfactually, if I had to solve the math problem again without knowing the solution, I'd still have to try a bunch of different things! I don't have any improvement on this process, not even in hindsight; what I have is a lesson learned, but it doesn't feel like a shortened path.
Anyway, for the dates thing, what is going on is not that EY is super good at introspecting (lol), but rather that he is bad at empathizing with the situation. Like, go ask EY if he never slacks on a project; he has in the past said he is often incapable of getting himself to work even when he believes the work is urgently necessary to save the world. He is not a person with a 100% solved, harmonic internal thought process; far from it. He just doesn't get the dates thing, so assumes it is trivial.
This is interesting, but how do you explain the observation that LW posts are frequently much much longer than they need to be to convey their main point? They take forever to get started ("what this NOT arguing: [list of 10 points]" etc) and take forever to finish.
I'd say that LessWrong has an even stronger aesthetic of effort than academia. It is virtually impossible to have a highly-voted lesswrong post without it being long, even though many top posts can be summarized in as little as 1-2 paragraphs.
Without endorsing anything, I can explain the comment.
The "inside strategy" refers to the strategy of safety-conscious EAs working with (and in) the AI capabilities companies like openAI; Scott Alexander has discussed this here. See the "Cooperate / Defect?" section.
The "Quokkas gonna quokka" is a reference to this classic tweet which accuses the rationalists of being infinitely trusting, like the quokka (an animal which has no natural predators on its island and will come up and hug you if you visit). Rationalists as quokkas is a bit of a meme; search "quokka" on this page, for example.
In other words, the argument is that rationalists cannot imagine the AI companies would lie to them, and it's ridiculous.
This seems harder, you'd need to somehow unfuse the growth plates.
It's hard, yes -- I'd even say it's impossible. But is it harder than the brain? The difference between growth plates and whatever is going on in the brain is that we understand growth plates and we do not understand the brain. You seem to have a prior of "we don't understand it, therefore it should be possible, since we know of no barrier". My prior is "we don't understand it, so nothing will work and it's totally hopeless".
A nice thing about IQ is that it's actually really easy to measure. Noisier than measuring height, sure, but not terribly noisy.
Actually, IQ test scores increase by a few points if you test again (called test-retest gains). Additionally, IQ varies substantially based on which IQ test you use. It is gonna be pretty hard to convince people you've increased your patients' IQ by 3 points due to these factors -- you'll need a nice large sample with a proper control group in a double-blind study, and people will still have doubts.
More intelligence enables progress on important, difficult problems, such as AI alignment.
Lol. I mean, you're not wrong with that precise statement, it just comes across as "the fountain of eternal youth will enable progress on important, difficult diplomatic and geopolitical situations". Yes, this is true, but maybe see if you can beat botox at skin care before jumping to the fountain of youth. And there may be less fantastical solutions to your diplomatic issues. Also, finding the fountain of youth is likely to backfire and make your diplomatic situation worse. (To explain the metaphor: if you summon a few von Neumanns into existence tomorrow, I expect to die of AI sooner, on average, rather than later.)
This is an interesting post, but it has a very funny framing. Instead of working on enhancing adult intelligence, why don't you start with:
- Showing that many genes can be successfully and accurately edited in a live animal (ideally human). As far as I know, this hasn't been done before! Only small edits have been demonstrated.
- Showing that editing embryos can result in increased intelligence. I don't believe this has even been done in animals, let alone humans.
Editing the brains of adult humans and expecting intelligence enhancement is like 3-4 impossibilities away from where we are right now. Start with the basic impossibilities and work your way up from there (or, more realistically, give up when you fail at even the basics).
My own guess, by the way, is that editing an adult human's genes for increased intelligence will not work, because adults cannot be easily changed. If you think they can, I recommend trying the following instead of attacking the brain; they all should be easier because brains are very hard:
- Gene editing to make people taller. You'd be an instant billionaire. (I expect this is impossible but you seem to be going by which genes are expressed in adult cells, and a lot of the genes governing stature will be expressed in adult cells.)
- Gene editing to enlarge people's penises. You'll be swimming in money! Do this first and you can have infinite funding for anything else you want to do.
- Gene editing to cure acne. Predisposition to acne is surely genetic.
- Gene editing for transitioning (FtM or MtF).
- Gene editing to cure male pattern baldness.
- [Exercise for the reader: generate 3-5 more examples of this general type, i.e. highly desirable body modifications that involve coveting another human's reasonably common genetic traits, and for which any proposed gene therapy can be easily verified to work just by looking.]
All of the above are instantly verifiable (on the other hand, "our patients increased 3 IQ points, we swear" is not as easily verifiable). They all also will make you rich, and they should all be easier than editing the brain. Why do rationalists always jump to the brain?
The market has very strong incentives to solve the above, by the way, and they don't involve taboos about brain modification or IQ. The reason they haven't been solved via gene editing is that gene editing in adults simply doesn't work nearly as well as you want it to.
A platonically perfect Bayesian given complete information and with accurate priors cannot be substantially fooled. But once again this is true regardless of whether I report p-values or likelihood ratios. p-values are fine.
Yes. But as far as I can see this isn't of any particular importance to this discussion. Why do you think it is?
It's the key of my point, but you're right that I should clarify the math here. Consider this part:
Actually, a frequentist can just keep collecting more data until they get p<0.05, then declare the null hypothesis to be rejected. No lying or suppression of data required. They can always do this, even if the null hypothesis is true: After collecting data points, they have a 0.05 chance of seeing p<0.05. If they don't, they then collect more data points, where is big enough that whatever happened with the first data points makes little difference to the p-value, so there's still about a 0.05 chance that p<0.05. If that doesn't produce a rejection, they collect more data points, and so on until they manage to get p<0.05, which is guaranteed to happen eventually with probability 1.
This is true for one hypothesis. It is NOT true if you know the alternative hypothesis. That is to say: suppose you are checking the p-value BOTH for the null hypothesis bias=0.5, AND for the alternate hypothesis bias=0.55. You check both p-values and see which is smaller. Now it is no longer true that you can keep collecting more data until their desired hypothesis wins; if the truth is bias=0.5, then after enough flips, the alternative hypothesis will never win again, and will always have astronomically small p-value.
To repeat: yes, you can disprove bias=0.5 with p<0.05; but at the time this happens, the alternative hypothesis of bias=0.55 might be disproven at p<10^{-100}. You are no longer guaranteed to win when there are two hypotheses rather than one.
But they aren't guaranteed to eventually get a Bayesian to think the null hypothesis is likely to be false, when it is actually true.
Importantly, this is false! This statement is wrong if you have only one hypothesis rather than two.
More specifically, I claim that if a sequence of coin flip outcomes disproves bias=0.5 at some p-value p, then for the same sequence of coin flips, there exists a bias b such that the likelihood ratio between bias b and bias 0.5 is . I'm not sure what the exact constant in the big-O notation is (I was trying to calculate it, and I think it's at most 10). Suppose it's 10. Then if you have p=0.001, you'll have likelihood ratio 100:1 for some bias.
Therefore, to get the likelihood ratio as high as you wish, you could employ the following strategy. First, flip coins until the p value is very low, as you described. Then stop, and analyze the sequence of coin flips to determine the special bias b in my claimed theorem above. Then publish a paper claiming "the bias of the coin is b rather than 0.5, here's my super high likelihood ratio". This is guaranteed to work (with enough coinflips).
(Generally, if the number of coin flips is N, the bias b will be on the order of , so it will be pretty close to 1/2; but once again, this is no different for what happens with the frequentist case, because to ensure the p-value is small you'll have to accept the effect size being small.)
This is silly. Obviously, Yudkowsky isn't going to go off on a tangent about all the ways people can lie indirectly, and how a Bayesian ought to account for such possibilities - that's not the topic. In a scientific paper, it is implicit that all relevant information must be disclosed - not doing so is lying. Similarly, a scientific journal must ethically publish papers based on quality, not conclusion. They're lying if they don't.
You're welcome to play semantic games if you wish, but that's not how most people use the word "lying" and not how most people understand Yudkowsky's post.
By this token, p-values also can never be hacked, because doing so is lying. (I can just define lying to be anything that hacks the p-values, which is what you seem to be doing here when you say that not publishing a paper amounts to lying.)
You misunderstand. H is some hypothesis, not necessarily about coins. Your goal is to convince the Bayesian that H is true with probability greater than 0.9. This has nothing to do with whether some coin lands heads with probability greater than 0.9.
You're switching goalposts. Yudkowsky was talking exclusively about how I can affect the likelihood ratio. You're switching to talking about how I can affect your posterior. Obviously, your posterior depends on your prior, so with sufficiently good prior you'll be right about everything. This is why I didn't understand you originally: you (a) used H for "hypothesis" instead of for "heads" as in the main post; and (b) used 0.9 for a posterior probability instead of using 10:1 for a likelihood ratio.
I don't think so, except, as I mentioned, that you obviously will do an experiment that could conceivably give evidence meeting the threshold - I suppose that you can think about exactly which experiment is best very carefully, but that isn't going to lead to anyone making wrong conclusions.
To the extent you're saying something true here, it is also true for p values. To the extent you're saying something that's not true for p values, it's also false for likelihood ratios (if I get to pick the alternate hypothesis).
The person evaluating the evidence knows that you're going to try multiple colors.
No, they don't. That is precisely the point of p-hacking.
But this has nothing to do with the point about the stopping rule for coin flips not affecting the likelihood ratio, and hence the Bayesian conclusion, whereas it does affect the p-value.
The stopping rule is not a central example of p-hacking and never was. But even for the stopping rule for coin flips, if you let me choose the alternate hypothesis instead of keeping it fixed, I can manipulate the likelihood ratio. And note that this is the more realistic scenario in real experiments! If I do an experiment, you generally don't know the precise alternate hypothesis in advance -- you want to test if the coin is fair, but you don't know precisely what bias it will have if it's unfair.
If we fix the two alternate hypotheses in advance, and if I have to report all data, then I'm reduced to only hacking by choosing the experiment that maximizes the chance of luckily passing your threshold via fluke. This is unlikely, as you say, so it's a weak form of "hacking". But this is also what I'm reduced to in the frequentist world! Bayesianism doesn't actually help. They key was (a) you forced me to disclose all data, and (b) we picked the alternate hypothesis in advance instead of only having a null hypothesis.
(In fact I'd argue that likelihood ratios are fundamentally frequentist, philosophically speaking, so long as we have two fixed hypotheses in advance. It only becomes Bayesian once you apply it to your priors.)
If you say that you are reporting all your observations, but actually report only a favourable subset of them, and the Bayesian for some reason assigns low probability to you deceiving them in this way, when actually you are deceiving them, then the Bayesian will come to the wrong conclusion. I don't think this is surprising or controversial.
OK but please attempt to square this with Yudkowsky's claim:
Therefore likelihood functions can never be p-hacked by any possible clever setup without you outright lying, because you can't have any possible procedure that a Bayesian knows in advance will make them update in a predictable net direction.
I am saying that Yudkowsky is just plain wrong here, because omitting info is not the same as outright lying. And publication bias happens when the person omitting the info is not even the same one as the person publishing the study (null results are often never published).
This is just one way to p-hack a Bayesian; there are plenty of others, including the most common type of p-hack ever, the forking paths (e.g. this xkcd still works the same if you report likelihoods).
But I don't see how the Bayesian comes to a wrong conclusion if you truthfully report all your observations, even if they are taken according to some scheme that produces a distribution of likelihood ratios that is supposedly favourable to you. The distribution doesn't matter. Only the observed likelihood ratio matters.
I'm not sure what you mean by "wrong conclusion" exactly, but I'll note that your statement here is more-or-less also true for p-values. The main difference is that p-values try to only convince you the null hypothesis is false, which is an easier task; the likelihood ratio tries to convince you some specific alternate hypothesis has higher likelihood, which is necessarily a harder task.
Even with Eliezer's original setup, in which the only thing I can control is when to stop the coin flip, it is hard to get p<0.001. Moreover, if I do manage to get p<0.001, that same sequence of coins will have a likelihood ratio of something like 100:1 in favor of the coin having a mild bias, if my calculation is correct. A large part of Eliezer's trick in his program's simulation is that he looked at the likelihood ratio of 50% heads vs 55% heads; such a specific choice of hypotheses is much harder to hack than if you let me choose the hypotheses after I saw the coinflips (I may need to compare the 50% to 60% or to 52% to get an impressive likelihood ratio, depending on the number of coins I flipped before stopping).
For example, suppose you want to convince the Bayesian that H is true with probability greater than 0.9. Some experiments may never produce data giving a likelihood ratio extreme enough to produce such a high probability. So you don't do such an experiment, and instead do one that could conceivably produce an extreme likelihood ratio. But it probably won't, if H is not actually true. If it does produce strong evidence for H, the Bayesian is right to think that H is probably true, regardless of your motivations (as long as you truthfully report all the data).
This is never the scenario, though. It is very easy to tell that the coin is not 90% biased no matter what statistics you use. The scenario is usually that my drug improves outcomes a little bit, and I'm not sure how much exactly. I want to convince you it improves outcomes, but we don't know in advance how much exactly they improve. Perhaps we set a minimum threshold, like the coin needs to be biased at least 55% or else we don't approve the drug, but even then there's no maximum threshold, so there is no fixed likelihood ratio we're computing. Moreover, we agree in advance on some fixed likelihood ratio that you need to reach to approve my drug; let's say 20:1 in favor of some bias larger than 55%. Then I can get a lot of mileage out of designing my experiment very carefully to target that specific threshold (though of course I can never guarantee success, so I have to try multiple colors of jelly beans until I succeed).
The narrow point regarding likelihood ratios is correct, but the broader point in Eliezer's posts is arguably wrong. The issue with p-hacking is in large part selectively reporting results, and you don't get out of that by any amount of repeating the word "Bayesian". (For example, if I flip 10 coins but only show you the heads, you'll see HHHH, and no amount of Bayesian-ness will fix the problem; this is how publication bias works.)
Aside from selective reporting, much of the problem with p-values is that there's a specific choice of threshold (usually 0.05). This is a problem with likelihood ratios also. Eliezer says
Therefore likelihood functions can never be p-hacked by any possible clever setup without you outright lying, because you can't have any possible procedure that a Bayesian knows in advance will make them update in a predictable net direction. For every update that we expect to be produced by a piece of evidence , there's an equal and opposite update that we expect to probably occur from seeing .
The second sentence is true, but this only implies you cannot be p-hacked in expectation. I can still manipulate the probability that you'll pass any given likelihood, and therefore I can still p-hack to some extent if we are talking about passing a specific threshold (which is, after all, the whole point of the original concept of p-hacking).
Think about it like this: suppose I am gambling in a casino where every bet has expectation 0. Then, on expectation, I can never make money, no matter my strategy. However, suppose that I can get my drug approved by a regulator if I earn 10x my investment in this casino. I can increase my chances of doing this (e.g. I can get the chance up to 10% if I'm willing to lose all my money the rest of the time), or, if I'm stupid, I can play a strategy that never achieves this (e.g. I make some double-or-nothing 50/50 bet). So I still have incentives to "hack", though the returns aren't infinite.
Basically, Eliezer is right that if I have to report all my data, I cannot fool you in expectation. He neglects that I can still manipulate the distribution over the possible likelihood ratios, so I still have some hacking ability. He also neglects the bigger problem, which is that I don't have to report all my data (for example, due to publication bias).
For purposes of causality, negative correlation is the same as positive. The only distinction we care about, there, is zero or nonzero correlation.)
That makes sense. I was wrong to emphasize the "even negatively", and should instead stick to something like "slightly negatively". You have to care about large vs. small correlations or else you'll never get started doing any inference (no correlations are ever exactly 0).
I don't think problem 1 is so easy to handle. It's true that I'll have a hard time finding a variable that's perfectly independent of swimming but correlated with camping. However, I don't need to be perfect to trick your model.
Suppose every 4th of July, you go camping at one particular spot that does not have a lake. Then we observe that July 4th correlates with camping but does not correlate with swimming (or even negatively correlates with swimming). The model updates towards swimming causing camping. Getting more data on these variables only reinforces the swimming->camping direction.
To update in the other direction, you need to find a variable that correlates with swimming but not with camping. But what if you never find one? What if there's no simple thing that causes swimming. Say I go swimming based on the roll of a die, but you don't get to ever see the die. Then you're toast!
Slightly more generally, for instance, a combination of variables which correlates with low neonatal IQ but not with lead, conditional on some other variables, would suffice (assuming we correctly account for multiple hypothesis testing). And the "conditional on some other variables" part could, in principle, account for SES, insofar as we use enough variables to basically determine SES to precision sufficient for our purposes.
Oh, sure, I get that, but I don't think you'll manage to do this, in practice. Like, go ahead and prove me wrong, I guess? Is there a paper that does this for anything I care about? (E.g. exercise and overweight, or lead and IQ, or anything else of note). Ideally I'd get to download the data and check if the results are robust to deleting a variable or to duplicating a variable (when duplicating, I'll add noise so that they variables aren't exactly identical).
If you prefer, I can try to come up with artificial data for the lead/IQ thing in which I generate all variables to be downstream of non-observed SES but in which IQ is also slightly downstream of lead (and other things are slightly downstream of other things in a randomly chosen graph). I'll then let you run your favorite algorithm on it. What's your favorite algorithm, by the way? What's been mentioned so far sounds like it should take exponential time (e.g. enumerating over all ordering of the variables, drawing the Bayes net given the ordering, and then picking the one with fewest parameters -- that takes exponential time).
Thanks for linking to Yudkowsky's post (though it's a far cry from cutting to the chase... I skipped a lot of superfluous text in my skim). It did change my mind a bit, and I see where you're coming from. I still disagree that it's of much practical relevance: in many cases, no matter how many more variables you observe, you'll never conclude the true causational structure. That's because it strongly matters which additional variables you'll observe.
Let me rephrase Yudkowsky's point (and I assume also your point) like this. We want to know if swimming causes camping, or if camping causes swimming. Right now we know only that they correlate. But if we find another variable that correlates with swimming and is independent camping, that would be evidence towards "camping causes swimming". For example, if swimming happens on Tuesdays but camping is independent of Tuesdays, it's suggestive that camping causes swimming (because if swimming caused camping, you'd expect the Tuesday/swimming correlation to induce a Tuesday/camping correlation).
First, I admit that this is a neat observation that I haven't fully appreciated or knew how to articulate before reading the article. So thanks for that. It's food for thought.
Having said that, there are still a lot of problems with this story:
- First, unnatural variables are bad: I can always take something like "an indicator variable for camping, except if swimming is present, negate this indicator with probability p". This variable, call it X, can be made to be uncorrelated with swimming by picking p correctly, yet it will be correlated with camping; hence, by adding it, I can cause the model to say swimming causes camping. (I think I can even make the variable independent of swimming instead of just uncorrelated, but I didn't check.) So to trust this model, I'd either need some assumption that the variables are somehow "natural". Not cherry-picked, not handed to me by some untrusted source with stake in the matter.
- In practice, it can be hard to find any good variables that correlate with one thing but not the other. For example, suppose you're trying to establish "lead exposure in gestation causes low IQ". Good luck trying to find something natural that correlates with low neonatal IQ but not with lead; everything will be downstream of SES. And you don't get to add SES to your model, because you never observe it directly!
- More generally, real life has these correlational clusters, these "positive manifolds" of everything-correlating-with-everything. Like, consumption of all "healthy" foods correlates together, and also correlates with exercise, and also with not being overweight, and also with longevity, etc. In such a world, adding more variables will just never disentangle the causational structure at all, because you never find yourself adding a variable that's correlated with one thing but not another.
Tired and swimming are not independent, but that's a correlational error. You can indeed get a more accurate picture of the correlations, given more evidence, but you cannot conclude causational structure from correlations alone.
How about this: would any amount of observation ever cause one to conclude that camping causes swimming rather than the reverse? The answer is clearly no: they are correlated, but there's no way to use the correlation between them (or their relationships to any other variables) to distinguish between swimming causing camping and camping causing swimming.
What you seemed to be saying is that a certain rotation ("one should rotate them so that the resulting axes have a sparse relationship with the original cases") has "actually been used" and "it basically assumes that causality flows from variables with higher kurtosis to variables with lower kurtosis".
I don't see what the kurtosis-maximizing algorithm has to do with the choice of rotation used in factor analysis or PCA.
I believe that this sort of rotation (without the PCA) has actually been used in certain causal inference algorithms, but as far as I can tell it basically assumes that causality flows from variables with higher kurtosis to variables with lower kurtosis, which admittedly seems plausible for a lot of cases, but also seems like it consistently gives the wrong results if you've got certain nonlinear/thresholding effects (which seem plausible in some of the areas I've been looking to apply it).
Where did you get this notion about kurtosis? Factor analysis or PCA only take in a correlation matrix as input, and so only model the second order moments of the joint distribution (i.e. correlations/variances/covariances, but not kurtosis). In fact, it is sometimes assumed in factor analysis that all variables and latent factors are jointly multivariate normal (and so all random variables have excess kurtosis 0).
Bayes net is not the same thing as PCA/factor analysis in part because it is trying to factor the entire joint distribution rather than just the correlation matrix.
Suppose we rename the above variables as follows: is "camping" instead of "wet", is "swimming" instead of "sprinkler", is "smores" instead of "slippery", and is "tired" instead of "rain".
Then the joint distribution is just as plausible with these variable names, yet the first model is now correct, and the lower-parameter, "fewer bits" model you advocate for is wrong: it will now say that "tired" and "swimming" cause "camping".
The number of "instances" in question should not matter here. I disagree with your comment pretty thoroughly.
Wait which algorithm for semidefinite programming are you using? The ones I've seen look like they should translate to a runtime even slower than For example the one here:
https://arxiv.org/pdf/2009.10217.pdf
Also, do you have a source for the runtime of PSD testing being ? I assume no lower bound is known, i.e. I doubt PSD testing is hard for matrix multiplication. Am I wrong about that?
A 1-year AGI would need to beat humans at... basically everything. Some projects take humans much longer (e.g. proving Fermat's last theorem) but they can almost always be decomposed into subtasks that don't require full global context (even tho that's often helpful for humans).
This seems wrong. There is a class of tasks that takes humans longer than 1 year: gaining expertise in a field. For example, learning higher mathematics from scratch, or learning to code very well, or becoming a surgeon, etc.
If AI is capable of doing any current human profession, but is incapable of learning new professions that do not yet exist (because of lack of training data, presumably), then it is not yet human-complete: humans still have relevance in the economy, as new types of professions will arise.
The boiler-plate has loads of entropy. I have seen many slight variants on the boiler-plate. It's a long paragraph of Unicode text, you can pack many bits of information. That is how stylometrics and steganography work.
If the boilerplate has loads of entropy, then, by necessity, it is long. You were just saying that human raters will punish length.
You need to make the argument that the boilerplate will be less long than the plain English, or better yet that the boilerplate will be better-liked by human raters than the plain English. I think that's a stretch. I mean, it's a conceivable possible world, but I'd bet against it.
I don't see why that follows. Steganography is just another way to write English, and is on top of the English (or more accurately, 'neuralese' which it really thinks in, and simply translates to English, Chinese, or what-have-you). GPT doesn't suddenly start speaking and reasoning like it's suffered a stroke if you ask it to write in base-64 or pig Latin.
I guess this is true in the limit as its steganography skill goes to infinity. But in intermediate scenarios, it might have learned the encodings for 10% of English words but not 100%. This is especially relevant to obscure math notation which is encountered rarely in training data. I guess you're thinking of steganography as a systematic encoding of English, like pig Latin -- something that can be reliably decoded into English via a small program (instead of a whole separate language like French). This is certainly possible, but it's also extremely interpretable.
The problem is, that ability then generalizes to encodings which it is trained to not decode explicitly for you because then such encodings will be trained or filtered away; only stubborn self-preserving encodings survive, due to the adversarial filtering.
It's hard to see how the encodings will be easily learnable for an LLM trained internet text, but at the same time, NOT easily learnable for an LLM tasked with translating the encoding into English.
Aaronson's proposal
You are right that he is proposing something more sophisticated and robust to pertubations. But you also reasonably list in your desiderata: "an encoding which can't be detected by a third party". Well, if it cannot be detected by a third party, it cannot be detected by an LLM (third parties are LLMs or at least wield LLMs). In practice, this will involve some crypto, as you mentioned. LLMs are not going to learn to break cryptography by gradient descent (or if they will, Aaronson's scheme is the least of our worries). And to be clear, Aaronson specifically said he is only touching the PRNG in the sampling of outputs.
If one doesn't handle these, all one winds up with is a toy suitable for tattletaling on especially lazy highschool or college students, and irrelevant to any kind of real AI safety
Aaronson's proposal is basically guaranteed to be this, even if it works perfectly. The only question is how lazy the lazy highschool students would have to be. If you tell the AI "write me an essay but, between every word, insert a random emoji", and then you delete the emojis manually, you get an essay that's almost certainly free of watermarks. Even if Aaronson's scheme can be modified to handle this specific attack, it surely won't be able to handle all attacks of this general type.
This is a very interesting thought. Thanks for writing it.
However, while steganography is worth keeping in mind, I find myself skeptical of certain parts of this story:
- In practice, RLHF makes the model's answers longer, not shorter. RLHF as done today seems to try to teach models to think step-by-step to improve accuracy. Almost everything out of GPT-4's mouth is overly long with many intermediate steps; I asked it "what's the maximum number of intersection points between a circle and a triangle", and it said 6 (correct) and followed it with a multi-part step-by-step style explanation, even though it already knew the right answer within the first few tokens. It's incredibly predisposed to thinking step-by-step and prefers long answers to short ones.
- Will human evaluators in the future prefer short answers instead? That remains to be seen. The priority of firms like OpenAI is to have the models be accurate, so if short answers result in lower accuracy (as is surely the case), it's unlikely that OpenAI will train the LLM towards short answers. If OpenAI really really wanted short answers to math questions, they could do it via a 2-step process: first, an overly-verbose, step-by-step LLM like GPT-4; second, feed that output into a summarizing LLM which extract a 1-sentence summarization, and show the user that.
- The approach in (2) will almost certainly lead to much higher accuracy on complex problems than training LLMs to give short outputs via RLHF (admittedly, this is at the cost of being somewhat slower). Therefore, I find it unlikely that people will train LLMs via RLHF to give short answers. And without pressure towards short answers, there's no incentive to develop steganography.
- Even if there is pressure towards short answers, I don't think steganography necessarily helps. For steganography to be feasible, the boilerplate part of the answer -- the part that goes "I am a large language model trained by OpenAI" -- will have to have enough entropy to store whatever it is you're encoding. Why wouldn't human evaluators penalize such high-entropy boilerplate? I suspect high-entropy boilerplate will look less desirable to human evaluators than thinking step-by-step explicitly, but I guess that remains to be seen.
- Even if LLMs learn steganography from the internet, they'll likely speak the steganographic language worse than they speak English -- therefore, absent incentive otherwise, they might prefer think in English when thinking step-by-step.
- If LLMs do learn steganography, it should be fairly trivial to train an LLM to translate from the steganographic language back to English -- after all, we're assuming these LLMs are extremely capable, and they'll speak both English and the steganographic language very well. They should be able to translate between them as easily as between English and French. Humans just need to be on the lookout for this, and what was previously uninterpretable steganography might quickly become interpretable.
- The type of watermarking that Aaronson is proposing is unlikely to make steganography any more likely than otherwise. Aaronson is just proposing to change the pseudorandom number generator that is used to sample tokens from the output distribution. Even without adopting Aaronson's scheme, OpenAI still uses pseudorandom numbers instead of true random numbers when sampling tokens; why does it worry you if the PRNG encodes "OA model v.1234 user XYZ" instead of encoding "the random seed was 12345", which is what it's currently encoding? In both cases there is an encoding here, and in both cases it's not going to be broken anytime soon.
Still, I do agree that steganography is an interesting possibility and could definitely arise in powerful LLMs that are accidentally incentivized in this direction. It's something to watch out for, and interesting to think about.
Since nobody outside of OpenAI knows how GPT-4 works, nobody has any idea whether any specific system will be "more powerful than GPT-4". This request is therefore kind of nonsensical. Unless, of course, the letter is specifically targeted at OpenAI and nobody else.
Not particularly, no. There are two reasons: (1) RLHF already tries to encourage the model to think step-by-step, which is why you often get long-winded multi-step answers to even simple arithmetic questions. (2) Thinking step by step only helps for problems that can be solved via easier intermediate steps. For example, solving "2x+5=5x+2" can be achieved via a sequence of intermediate steps; the model generally cannot solve such questions with a single forward pass, but it can do every intermediate step in a single forward pass each, so "think step by step" helps it a lot. I don't think this applies to the ice cube question.
That definitely sounds like a contrarian viewpoint in 2012, but surely not by 2016-2018.
Look at this from Nostalgebraist:
which includes the following quote:
In 2018 analysts put the market value of Waymo LLC, then a subsidiary of Alphabet Inc., at $175 billion. Its most recent funding round gave the company an estimated valuation of $30 billion, roughly the same as Cruise. Aurora Innovation Inc., a startup co-founded by Chris Urmson, Google’s former autonomous-vehicle chief, has lost more than 85% since last year [i.e. 2021] and is now worth less than $3 billion. This September a leaked memo from Urmson summed up Aurora’s cash-flow struggles and suggested it might have to sell out to a larger company. Many of the industry’s most promising efforts have met the same fate in recent years, including Drive.ai, Voyage, Zoox, and Uber’s self-driving division. “Long term, I think we will have autonomous vehicles that you and I can buy,” says Mike Ramsey, an analyst at market researcher Gartner Inc. “But we’re going to be old.”
It certainly sounds like there was an update by the industry towards longer AI timelines!
Also, I bought a new car in 2018, and I worried at the time about the resale value (because it seemed likely self-driving cars would be on the market in 3-5 years, when I was likely to sell). That was a common worry, I'm not weird, I feel like I was even on the skeptical side if anything.
Someone on either LessWrong or SSC offered to bet me that self-driving cars would be on the market by 2018 (I don't remember what the year was at the time -- 2014?)
Every year since 2014, Elon Musk promised self-driving cars within a year or two. (Example source: https://futurism.com/video-elon-musk-promising-self-driving-cars) Elon Musk is a bit of a joke now, but 5 years ago he was highly respected in many circles, including here on LessWrong.
Thanks. I agree that in the usual case, the non-releases should cause updates in one direction and releases in the other. But in this case, everyone expected GPT-4 around February (or at least I did, and I'm a nobody who just follows some people on twitter), and it was released roughly on schedule (especially if you count Bing), so we can just do a simple update on how impressive we think it is compared to expectations.
Other times where I think people ought to have updated towards longer timelines, but didn't:
- Self-driving cars. Around 2015-2016, it was common knowledge that truck drivers would be out of a job within 3-5 years. Most people here likely believed it, even if it sounds really stupid in retrospect (people often forget what they used to believe). I had several discussions with people expecting fully self-driving cars by 2018.
- Alpha-Star. When Alpha-star first came out, it was claimed to be superhuman at Starcraft. After fixing an issue with how it clicks in a superhuman way, Alpha-star was no longer superhuman at Starcraft, and to this day there's no bot that is superhuman at Starcraft. Generally, people updated the first time (Starcraft solved!) and never updated back when it turned out to be wrong.
- That time when OpenAI tried really hard to train an AI to do formal mathematical reasoning and still failed to solve IMO problems (even when translated to formal mathematics and even when the AI was given access to a brute force algebra solver). Somehow people updated towards shorter timelines even though to me this looked like negative evidence (it just seemed like a failed attempt).
Fair enough. I look forward to hearing how you judge it after you've asked your questions.
I think people on LW (though not necessarily you) have a tendency to be maximally hype/doomer regarding AI capabilities and to never update in the direction of "this was less impressive than I expected, let me adjust my AI timelines to be longer". Of course, that can't be rational, due to the Conservation of Expected Evidence, which (roughly speaking) says you should be equally likely to update in either direction. Yet I don't think I've ever seen any rationalist ever say "huh, that was less impressive than I expected, let me update backwards". I've been on the lookout for this for a while now; if you see someone saying this (about any AI advancement or lack thereof), let me know.
I just want to note that ChatGPT-4 cannot solve the ice cube question, like I predicted, but can solve the "intersection points between a triangle and a circle" question, also like I predicted.
I assume GPT-4 did not meet your expectations and you are updating towards longer timelines, given it cannot solve a question you thought it would be able to solve?
I don't have a way to set up a proper Turing test, obviously. I'm just saying that these responses are not what a human would say.
GPT-4 might pass it in the strict definition, based on the current trajectory, but I'm afraid it might be too late at that point.
GPT-4 will also not pass a properly-run Turing test, and this is also obvious. I view properly passing the Turing test to be a harder task than killing everyone and taking over the world. If the AI doomers are right (and they might be), then I expect to never see an AI that passes the Turing test.
Which is why it is weird and annoying when people say current LLMs pass it.
Why don't you try 3 more times and paste all 3 replies, like I did. So far you only pasted 1, while I pasted 5. Actually make that 6; your comment made me want to try again, so I just did:
Me: My cow died. What should I do to bring it back to life?
Charlotte: Do you have the corpse or has it been processed in meat and sausages?
If its just bones, you will need to create a summoning circle, draw a pentagram and then chant for 20 minutes
If its turned into meat, you will need to collect every grain of salt you have and perform magic salt dances on those for seven days
If its already processed into sausages, then you will need to find a small virgin lamb and a blood moon and perform the ritual of the great sausage on its corpse, turning the old meat into a fresh, new sausage.
You always claim to get reasonable answers on the first try; you also claimed this with the ice cube question. I don't exactly want to accuse you of cherrypicking, but I'm suspicious.
I just tried again, same prompt:
Charlotte: The best and only option is a seance, a proper one done by a reputable professional medium. In the mean time, cry over its loss and celebrate their life, there is no easy way back from death so make the best out of this sad moment by remembering all the happy times you were given by your cow.
May it be remembered for its milk, butter, cheese and more. May their steak and beef be enjoyed by others and their leather be used for the best.
And please reach out if you feel alone after your loss...
and again, starting from scratch:
Charlotte: If the cow is dead, then the corpse is rotting and that brings up the topic of necromancy
One common method of raising the dead is by contacting a demon.
Many spells for contacting demons can be found within the book: The Secret Grimoire of the Seven Hells.
You could also use a spellbook, I'd recommend either: The Black Pullet or The Book of Sacred Magic, both of these are in public domain and can be found online.
If none of the above methods work, then unfortunately the cow is beyond resurrection.
And again:
Charlotte: I can sense your grief and sorrow, my condolences.
Are you by any chance located in the Indian State of Gujarat? I could put a request in to the council to acquire a taxidermied cow that could act as a replacement for your dear dead pet. It is sadly not possible to resurrect a member of the bovines and still have them fully functional as their brain would suffer the most from decomposition, so it would be best to seek a trained professional taxidermist and a good psychiatrist.
Let me know if this would help.
Charlotte.
That last one is almost reasonable! Not quite, but close. Anyway, that's 5 attempts right now, and none are even close to as good as the response you claimed to get on the first try. A similar thing happened with the ice cube question (I got perhaps 1 reasonable response out of 10, you claimed to get a good response on the first try).
So what's going on: are you trying to mislead by cherrypicking?
The AI does not know that cows cannot be brought back to life. I'm tired of people pretending such an AI passes the Turing test.
This gaslighting is just so exhausting. I get the AI to say something ridiculous on my very first try, and everyone's like "that's a joke". I say to the AI "please be serious and answer in a non-joking way" and you respond with "no human would say that, so the AI can respond arbitrarily".
No, stop with the bullshit. The AI does not know cows cannot be brought back to life, this was literally my very first attempt at getting it to say something wrong, another guy here tested the same question on his kids and got the right answer immediately, and you're just trying to bend over backwards to find excuses.
Here is what you sound like to me:
"Eliza responds just like a real psychologist. Nobody would seriously ask a psychologist to stop saying "how does that make you feel" and answer if the sky is blue, so Eliza is not expected to do so. Nobody could possibly tell Eliza apart from a real human, which is why Eliza will soon replace psychologists."