Posts
Comments
We've done some experiments with small reversible circuits. Empirically, a small circuit generated in the way you suggest has very obvious structure that makes it satisfy P (i.e. it is immediately evident from looking at the circuit that P holds).
This leaves open the question of whether this is true as the circuits get large. Our reasons for believing this are mostly based on the same "no-coincidence" intuition highlighted by Gowers: a naive heuristic estimate suggests that if there is no special structure in the circuit, the probability that it would satisfy P is doubly exponentially small. So probably if C does satisfy P, it's because of some special structure.
Is this a correct rephrasing of your question?
It seems like a full explanation of a neural network's low loss on the training set needs to rely on lots of pieces of knowledge that it learns from the training set (e.g. "Barack" is usually followed by "Obama"). How do random "empirical regularities" about the training set like this one fit into the explanation of the neural net?
Our current best guess about what an explanation looks like is something like modeling the distribution of neural activations. Such an activation model would end up having baked-in empirical regularities, like the fact that "Barack" is usually followed by "Obama". So in other words, just as the neural net learned this empirical regularity of the training set, our explanation will also learn the empirical regularity, and that will be part of the explanation of the neural net's low loss.
(There's a lot more to be said here, and our picture of this isn't fully fleshed out: there are some follow-up questions you might ask to which I would answer "I don't know". I'm also not sure I understood your question correctly.)
Yeah, I did a CS PhD in Columbia's theory group and have talked about this conjecture with a few TCS professors.
My guess is that P is true for an exponentially small fraction of circuits. You could plausibly prove this with combinatorics (given that e.g. the first layer randomly puts inputs into gates, which means you could try to reason about the class of circuits that are the same except that the inputs are randomly permuted before being run through the circuit). I haven't gone through this math, though.
Thanks, this is a good question.
My suspicion is that we could replace "99%" with "all but exponentially small probability in ". I also suspect that you could replace it with , with the stipulation that the length of (or the running time of V) will depend on . But I'm not exactly sure how I expect it to depend on -- for instance, it might be exponential in .
My basic intuition is that the closer you make 99% to 1, the smaller the number of circuits that V is allowed to say "look non-random" (i.e. are flagged for some advice ). And so V is forced to do more thorough checks ("is it actually non-random in the sort of way that could lead to P being true?") before outputting 1.
99% is just a kind-of lazy way to sidestep all of these considerations and state a conjecture that's "spicy" (many theoretical computer scientists think our conjecture is false) without claiming too much / getting bogged down in the details of how the "all but a small fraction of circuits" thing depends on or the length of or the runtime of V.
I think this isn't the sort of post that ages well or poorly, because it isn't topical, but I think this post turned out pretty well. It gradually builds from preliminaries that most readers have probably seen before, into some pretty counterintuitive facts that aren't widely appreciated.
At the end of the post, I listed three questions and wrote that I hope to write about some of them soon. I never did, so I figured I'd use this review to briefly give my takes.
- This comment from Fabien Roger tests some of my modeling choices for robustness, and finds that the surprising results of Part IV hold up when the noise is heavier-tailed than the signal. (I'm sure there's more to be said here, but I probably don't have time to do more analysis by the end of the review period.,)
- My basic take is that this really is a point in favor of well-evidenced interventions, but that the best-looking speculative interventions are nevertheless better. This is because I think "speculative" here mostly refers to partial measurement rather than noisy measurement. For example, maybe you can only foresee the first-order effects of an intervention, but not the second-order effects. If the first-order effect is a (known) quantity and the second-order effect is an (unknown) quantity , then modeling the second-order effect as zero (and thus estimating the quality of the intervention as ) isn't a noisy measurement; it's a partial measurement. It's still your best guess given the information you have.
- I haven't thought this through very much. I expect good counter-arguments and counter-counter-arguments to exist here.
-
- No -- or rather, only if the measurement is guaranteed to be exactly correct. To see this, observe that the variance of a noisy, unbiased measurement is greater than the variance of the quantity you're trying to measure (with equality only when the noise is zero), whereas the variance of a noiseless, partial measurement is less than the variance of the quantity you're trying to measure.
- Real-world measurements are absolutely partial. They are, like, mind-bogglingly partial. This point deserves a separate post, but consider for instance the action of donating $5,000 to the Against Malaria Foundation. Maybe your measured effect from the RCT is that it'll save one life: 50 QALYs or so. But this measurement neglects the meat-eating problem: the expected-child you'll save will grow up to eat expected-meat from factory farms, likely causing a great amount of suffering. But then you remember: actually there's a chance that this child will have a one eight-billionth stake in determining the future of the lightcone. Oops, actually this consideration totally dominates the previous two. Does this child have better values than the average human? Again: mind-bogglingly partial!
(The measurements are also, of course, noisy! RCTs are probably about as un-noisy as it gets: for example, making your best guess about the quality of an intervention by drawing inferences from uncontrolled macroeconomic data is much more noisy. So the answer is: generally both noisy and partial, but in some sense, much more partial than noisy -- though I'm not sure how much that comparison matters.) - The lessons of this post do not generalize to partial measurements at all! This post is entirely about noisy measurements. If you've partially measured the quality of an intervention, estimating the un-measured part using your prior will give you an estimate of intervention quality that you know is probably wrong, but the expected value of your error is zero.
Thanks for writing this. I think this topic is generally a blind spot for LessWrong users, and it's kind of embarrassing how little thought this community (myself included) has given to the question of whether a typical future with human control over AI is good.
(This actually slightly broadens the question, compared to yours. Because you talk about "a human" taking over the world with AGI, and make guesses about the personality of such a human after conditioning on them deciding to do that. But I'm not even confident that AGI-enabled control of the world by e.g. the US government would be good.)
Concretely, I think that a common perspective people take is: "What would it take for the future to go really really well, by my lights", and the answer to that question probably involves human control of AGI. But that's not really the action-relevant question. The action-relevant question, for deciding whether you want to try to solve alignment, is how the average world with human-controlled AGI compares to the average AGI-controlled world. And... I don't know, in part for the reasons you suggest.
Cool, you've convinced me, thanks.
Edit: well, sort of. I think it depends on what information you're allowing yourself to know when building your statistical model. If you're not letting yourself make guesses about how the LW population was selected, then I still think the SAT thing and the height thing are reasonable. However, if you're actually trying to figure out an estimate of the right answer, you probably shouldn't blind yourself quite that much.
These both seem valid to me! Now, if you have multiple predictors (like SAT and height), then things get messy because you have to consider their covariance and stuff.
Yup, I think that only about 10-15% of LWers would get this question right.
Yeah, I wonder if Zvi used the wrong model (the non-thinking one)? It's specifically the "thinking" model that gets the question right.
Just a few quick comments about my "integer whose square is between 15 and 30" question (search for my name in Zvi's post to find his discussion):
- The phrasing of the question I now prefer is "What is the least integer whose square is between 15 and 30", because that makes it unambiguous that the answer is -5 rather than 4. (This is a normal use of the word "least", e.g. in competition math, that the model is familiar with.) This avoids ambiguity about which of -5 and 4 is "smaller", since -5 is less but 4 is smaller in magnitude.
- This Gemini model answers -5 to both phrasings. As far as I know, no previous model ever said -5 regardless of phrasing, although someone said o1 Pro gets -5. (I don't have a subscription to o1 Pro, so I can't independently check.)
- I'm fairly confident that a majority of elite math competitors (top 500 in the US, say) would get this question right in a math competition (although maybe not in a casual setting where they aren't on their toes).
- But also this is a silly, low-quality question that wouldn't appear in a math competition.
- Does a model getting this question right say anything interesting about it? I think a little. There's a certain skill of being careful to not make assumptions (e.g. that the integer is positive). Math competitors get better at this skill over time. It's not that straightforward to learn.
- I'm a little confused about why Zvi says that the model gets it right in the screenshot, given that the model's final answer is 4. But it seems like the model snatched defeat from the jaws of victory? Like if you cut off the very last sentence, I would call it correct.
- Here's the output I get:
Thank you for making this! My favorite ones are 4, 5, and 12. (Mentioning this in case anyone wants to listen to a few songs but not the full Solstice.)
Yes, very popular in these circles! At the Bay Area Secular Solstice, the Bayesian Choir (the rationalist community's choir) performed Level Up in 2023 and Landsailor this year.
My Spotify Wrapped
Yeah, I agree that that could work. I (weakly) conjecture that they would get better results by doing something more like the thing I described, though.
My random guess is:
- The dark blue bar corresponds to the testing conditions under which the previous SOTA was 2%.
- The light blue bar doesn't cheat (e.g. doesn't let the model run many times and then see if it gets it right on any one of those times) but spends more compute than one would realistically spend (e.g. more than how much you could pay a mathematician to solve the problem), perhaps by running the model 100 to 1000 times and then having the model look at all the runs and try to figure out which run had the most compelling-seeming reasoning.
What's your guess about the percentage of NeurIPS attendees from anglophone countries who could tell you what AGI stands for?
I just donated $5k (through Manifund). Lighthaven has provided a lot of value to me personally, and more generally it seems like a quite good use of money in terms of getting people together to discuss the most important ideas.
More generally, I was pretty disappointed when Good Ventures decided not to fund what I consider to be some of the most effective spaces, such as AI moral patienthood and anything associated with the rationalist community. This has created a funding gap that I'm pretty excited about filling. (See also: Eli's comment.)
Consider pinning this post. I think you should!
It took until I was today years old to realize that reading a book and watching a movie are visually similar experiences for some people!
Let's test this! I made a Twitter poll.
Oh, that's a good point. Here's a freehand map of the US I drew last year (just the borders, not the outline). I feel like I must have been using my mind's eye to draw it.
I think very few people have a very high-fidelity mind's eye. I think the reason that I can't draw a bicycle is that my mind's eye isn't powerful/detailed enough to be able to correctly picture a bicycle. But there's definitely a sense in which I can "picture" a bicycle, and the picture is engaging something sort of like my ability to see things, rather than just being an abstract representation of a bicycle.
(But like, it's not quite literally a picture, in that I'm not, like, hallucinating a bicycle. Like it's not literally in my field of vision.)
Huh! For me, physical and emotional pain are two super different clusters of qualia.
My understanding of Sarah's comment was that the feeling is literally pain. At least for me, the cringe feeling doesn't literally hurt.
I don't really know, sorry. My memory is that 2023 already pretty bad for incumbent parties (e.g. the right-wing ruling party in Poland lost power), but I'm not sure.
Fair enough, I guess? For context, I wrote this for my own blog and then decided I might as well cross-post to LW. In doing so, I actually softened the language of that section a little bit. But maybe I should've softened it more, I'm not sure.
[Edit: in response to your comment, I've further softened the language.]
Yeah, if you were to use the neighbor method, the correct way to do so would involve post-processing, like you said. My guess, though, is that you would get essentially no value from it even if you did that, and that the information you get from normal polls would prrtty much screen off any information you'd get from the neighbor method.
I think this just comes down to me having a narrower definition of a city.
If you ask people who their neighbors are voting for, they will make their best guess about who their neighbors are voting for. Occasionally their best guess will be to assume that their neighbors will vote the same way that they're voting, but usually not. Trump voters in blue areas will mostly answer "Harris" to this question, and Harris voters in red areas will mostly answer "Trump".
Ah, I think I see. Would it be fair to rephrase your question as: if we "re-rolled the dice" a week before the election, how likely was Trump to win?
My answer is probably between 90% and 95%. Basically the way Trump loses is to lose some of his supporters or have way more late deciders decide on Harris. That probably happens if Trump says something egregiously stupid or offensive (on the level of the Access Hollywood tape), or if some really bad news story about him comes out, but not otherwise.
It's a little hard to know what you mean by that. Do you mean something like: given the information known at the time, but allowing myself the hindsight of noticing facts about that information that I may have missed, what should I have thought the probability was?
If so, I think my answer isn't too different from what I believed before the election (essentially 50/50). Though I welcome takes to the contrary.
I'm not sure (see footnote 7), but I think it's quite likely, basically because:
- It's a simpler explanation than the one you give (so the bar for evidence should probably be lower).
- We know from polling data that Hispanic voters -- who are disproportionately foreign-born -- shifted a lot toward Trump.
- The biggest shifts happened in places like Queens, NY, which has many immigrants but (I think?) not very much anti-immigrant sentiment.
That said, I'm not that confident and I wouldn't be shocked if your explanation is correct. Here are some thoughts on how you could try to differentiate between them:
- You could look on the precinct-level rather than the county-level. Some precincts will be very high-% foreign-born (above 50%). If those precincts shifted more than surrounding precincts, that would be evidence in favor of my hypothesis. If they shifted less, that would be evidence in favor of yours.
- If someone did a poll with the questions "How did you vote in 2020", "How did you vote in 2024", and "Were you born in the U.S.", that could more directly answer the question.
An interesting thing about this proposal is that it would make every state besides CA, TX, OK, and LA pretty much irrelevant for the outcome of the presidential election. E.g. in this election, whichever candidate won CATXOKLA would have enough electoral votes to win the election, even if the other candidate won every swing state.
...which of course would be unfair to the non-CATXOKLA states, but like, not any more unfair than the current system?
Yeah, that's right -- see this section for the full statements.
Since no one is giving answers, I'll give my super uninformed take. If anyone replies with a disagreement, you should presume that they are right.
During a recession, countries want to spend their money on economic stimulus programs that create jobs and get their citizens to spend more. China seems to be doing this.
Is spending on AI development good for these goals? I'm tempted to say no. One exception is building power plants, which China would maybe need to eventually do in order to build sufficiently large models.
At the same time, China seems to have a pretty big debt problem. Its debt-to-GDP ratio was 288% in 2023 (I think this number accounts not only for national debt but also for local government debt and maybe personal debt, which I think China has a lot of compared to other countries like the United States). This might in practice constrain how much it can spend.
So China is in a position of wanting to spend, but not spend too much, and AI probably isn't a great place for it to spend in order to accomplish its immediate goals.
In other words, I think the recession makes AGI development a lower priority for the Chinese government. It seems quite plausible to me that the recession might delay the creation of a large government project for building AGI by a few years.
(Again, I don't know stuff about this. Maybe someone will reply saying "Actually, China has already created a giant government project for building AGI" with a link.)
Thanks! This makes me curious: is sports betting anomalous (among forms of consumption) in terms of how much it substitutes for financial investing?
I think the "Provably Safe ML" section is my main crux. For example, you write:
One potential solution is to externally gate the AI system with provable code. In this case, the driving might be handled by an unsafe AI system, but its behavior would have “safety in the loop” by having simpler and provably safe code restrict what the driving system can output, to respect the rules noted above. This does not guarantee that the AI is a safe driver - it just keeps such systems in a provably safe box.
I currently believe that if you try to do this, you will either have to restrict the outputs so much that the car wouldn't be able to drive well, or else fail to prove that the actions allowed by the gate are safe. Perhaps you can elaborate on why this approach seems like it could work?
(I feel similarly about other proposals in that section.)
For what it's worth, I don't have any particular reason to think that that's the reason for her opposition.
But it seems like SB1047 hasn't been very controversial among CA politicians.
I think this isn't true. Concretely, I bet that if you looked at the distribution of Democratic No votes among bills that reached Newsom's desk, this one would be among the highest (7 No votes and a bunch of not-voting, which I think is just a polite way to vote No; source). I haven't checked and could be wrong!
My take is basically the same as Neel's, though my all-things-considered guess is that he's 60% or so to veto. My position on Manifold is in large part an emotional hedge. (Otherwise I would be placing much smaller bets in the same direction.)
I believe that Pelosi had never once spoken out against a state bill authored by a California Democrat before this.
Probably no longer willing to make the bet, sorry. While my inside view is that Harris is more likely to win than Nate Silver's 72%, I defer to his model enough that my "all things considered" view now puts her win probability around 75%.
[Edit: this comment is probably retracted, although I'm still confused; see discussion below.]
I'd like clarification from Paul and Eliezer on how the bet would resolve, if it were about whether an AI could get IMO silver by 2024.
Besides not fitting in the time constraints (which I think is kind of a cop-out because the process seems pretty parallelizable), I think the main reason that such a bet would resolve no is that problems 1, 2, and 6 had the form "find the right answer and prove it right", whereas the DeepMind AI was given the right answer and merely had to prove it right. Often, finding the right answer is a decent part of the challenge of solving an Olympiad problem. Quoting more extensively from Manifold commenter Balasar:
The "translations" to Lean do some pretty substantial work on behalf of the model. For example, in the theorem for problem 6, the Lean translation that the model is asked to prove includes an answer that was not given in the original IMO problem.
theorem imo_2024_p6 (IsAquaesulian : (ℚ → ℚ) → Prop) (IsAquaesulian_def : ∀ f, IsAquaesulian f ↔ ∀ x y, f (x + f y) = f x + y ∨ f (f x + y) = x + f y) : IsLeast {(c : ℤ) | ∀ f, IsAquaesulian f → {(f r + f (-r)) | (r : ℚ)}.Finite ∧ {(f r + f (-r)) | (r : ℚ)}.ncard ≤ c} 2
The model is supposed to prove that "there exists an integer
c
such that for any aquaesulian functionf
there are at mostc
different rational numbers of the formf(r)+f(−r)
for some rational numberr
, and find the smallest possible value ofc"
.The original IMO problem does not include that the smallest possible value of
c
is 2, but the theorem that AlphaProof was given to solve has the number 2 right there in the theorem statement. Part of the problem is to figure out what 2 is.Link: https://storage.googleapis.com/deepmind-media/DeepMind.com/Blog/imo-2024-solutions/P6/index.html
I'm now happy to make this bet about Trump vs. Harris, if you're interested.
Looks like this bet is voided. My take is roughly that:
- To the extent that our disagreement was rooted in a difference in how much to weight polls vs. priors, I continue to feel good about my side of the bet.
- I wouldn't have made this bet after the debate. I'm not sure to what extent I should have known that Biden would perform terribly. I was blindsided by how poorly he did, but maybe shouldn't have been.
- I definitely wouldn't have made this bet after the assassination attempt, which I think increased Trump's chances. But that event didn't update me on how good my side of the bet was when I made it.
- I think there's like a 75-80% chance that Kamala Harris wins Virginia.
I frequently find myself in the following situation:
Friend: I'm confused about X
Me: Well, I'm not confused about X, but I bet it's because you have more information than me, and if I knew what you knew then I would be confused.
(E.g. my friend who know more chemistry than me might say "I'm confused about how soap works", and while I have an explanation for why soap works, their confusion is at a deeper level, where if I gave them my explanation of how soap works, it wouldn't actually clarify their confusion.)
This is different from the "usual" state of affairs, where you're not confused but you know more than the other person.
I would love to have a succinct word or phrase for this kind of being not-confused!
Yup, sounds good! I've set myself a reminder for November 9th.
I'd have to think more about 4:1 odds, but definitely happy to make this bet at 3:1 odds. How about my $300 to your $100?
(Edit: my proposal is to consider the bet voided if Biden or Trump dies or isn't the nominee.)
I think the FiveThirtyEight model is pretty bad this year. This makes sense to me, because it's a pretty different model: Nate Silver owns the former FiveThirtyEight model IP (and will be publishing it on his Substack later this month), so FiveThirtyEight needed to create a new model from scratch. They hired G. Elliott Morris, whose 2020 forecasts were pretty crazy in my opinion.
Here are some concrete things about FiveThirtyEight's model that don't make sense to me:
- There's only a 30% chance that Pennsylvania, Michigan, or Wisconsin will be the tipping point state. I think that's way too low; I would put this probability around 65%. In general, their probability distribution over which state will be the tipping point state is way too spread out.
- They expect Biden to win by 2.5 points; currently he's down by 1 point. I buy that there will be some amount of movement toward Biden in expectation because of the economic fundamentals, but 3.5 seems too much as an average-case.
- I think their Voter Power Index (VPI) doesn't make sense. VPI is a measure of how likely a voter in a given state is to flip the entire election. Their VPIs are way to similar. To pick a particularly egregious example, they think that a vote in Delaware is 1/7th as valuable as a vote in Pennsylvania. This is obvious nonsense: a vote in Delaware is less than 1% as valuable as a vote in Pennsylvania. In 2020, Biden won Delaware by 19%. If Biden wins 50% of the vote in Delaware, he will have lost the election in an almost unprecedented landslide.
I claim that the following is a pretty good approximation to VPI: (probability that the state is the tipping state) * (number of electoral votes) / (number of voters). If you use their tipping-point state probabilities, you'll find that Pennsylvania's VPI should be roughly 4.3 times larger than New Hampshire's. Instead, FiveThirtyEight has New Hampshire's VPI being (slightly) higher than Pennsylvania's.I retract this: the approximation should instead be (tipping point state probability) / (number of voters). Their VPI numbers now seem pretty consistent with their tipping point probabilities to me, although I still think their tipping point probabilities are wrong.
The Economist also has a model, which gives Trump a 2/3 chance of winning. I think that model is pretty bad too. For example, I think Biden is much more than 70% likely to win Virginia and New Hampshire. I haven't dug into the details of the model to get a better sense of what I think they're doing wrong.