Posts
Comments
Maybe you could address these problems, but could you do so in a way that is "computationally cheap"? E.g., for forecasting on something like extinction, it is much easier to forecast on a vague outcome than to precisely define it.
I have a writeup on solar storm risk here that could be of interest
Nice consideration, we hadn't considered non-natural asteroids here. I agree this is a consideration as humanity reaches for the stars, or the rest of the solar system.
If you've thought about it a bit more, do you have a sense of your probability over the next 100 years?
To nitpick on your nitpick, in the US, 1000x safer would be 42 deaths yearly. https://en.wikipedia.org/wiki/Motor_vehicle_fatality_rate_in_U.S._by_year
For the whole world, it would just be above 1k. https://en.wikipedia.org/wiki/List_of_countries_by_traffic-related_death_rate#List, but 2032 seems like an ambitious deadline for that.
In addition, it does seem against the spirit of the question to resolve positively solely because of reducing traffic deaths.
To me this looks like circular reasoning: this example supports my conceptual framework because I interpret the example according to the conceptual framework.
Instead, I notice that Stockfish in particular has some salient characteristics that go against the predictions of the conceptual framework:
- It is indeed superhuman
- It is not the case that once Stockfish ends the game that's it. I can rewind Stockfish. I can even make one version of Stockfish play against another. I can make Stockfish play a chess variant. Stockfish doesn't annihilate my physical body when it defeats me
- It is extremely well aligned with my values. I mostly use it to analyze games I've played against other people my level
- If Stockfish wants to win the game and I want an orthogonal goal, like capturing its pawns, this is very feasible
Now, does this even matter for considering whether a superintelligence would trade, wouldn't trade? Not that much, it's a weak consideration. But insofar as it's a consideration, does it really convince someone who doesn't already but the frame? Not to me.
This is importantly wrong because the example is in the context of an analogy
getting some pawns : Stockfish : Stockfish's goal of winning the game :: getting a sliver of the Sun's energy : superintelligence : the superintelligence's goals
The analogy is presented as forceful and unambiguous, but it is not. It's instead an example of a system being grossly more capable than humans in some domain, and not opposing a somewhat orthogonal goal
Incidentally you have a typo on "pawn or too" (should be "pawn or two"), which is worrying in the context of how wrong this is.
There is no equally simple version of Stockfish that is still supreme at winning at chess, but will easygoingly let you take a pawn or too. You can imagine a version of Stockfish which does that -- a chessplayer which, if it's sure it can win anyways, will start letting you have a pawn or two -- but it's not simpler to build. By default, Stockfish tenaciously fighting for every pawn (unless you are falling into some worse sacrificial trap), is implicit in its generic general search through chess outcomes.
The bolded part (bolded by me) is just wrong man, here is an example of taking five pawns: https://lichess.org/ru33eAP1#35
Edit: here is one with six. https://lichess.org/SL2FnvRvA1UE
you will not find it easy to take Stockfish's pawns
Seems importantly wrong, in that if your objective is to take a few pawns (say, three), you can easily do this. This seems important in the context that it's hard to to obtain resources from an adversary that cares about things differently.
In the case of stockfish you can also rewind moves.
I disagree with the 5% of switching to a Sundar Pichai hairs simile:
- Prediction market prices are bounded between 0 and 1
- Polymarket has > 1k markets, and maybe 3 to 10 ambiguous resolutions a year. It's more like 0.3% to 1%.
I'm willing to bet 2k USD on my part against a single dollar yours that that if I waterboard you, you'll want to stop before 3 minutes have passed
Interesting, where are you physically located? Also, are you thinking of the unpleasantness of the situation, or are you thinking of the physical asphyxiation component?
You might want to download the Online Encyclopedia of Integer Sequences, e.g., as in here, and play around with it, e.g., look at the least likely completion for a given sequence & so on.
You're right, changed
I ended up solving the equations either analytically (partially with the help of Phil Trammell), https://forum.effectivealtruism.org/posts/FXPaccMDPaEZNyyre/a-model-of-patient-spending-and-movement-building or through simulations https://github.com/NunoSempere/ReverseShooting
https://github.com/NunoSempere/LaborCapitalAndTheOptimalGrowthOfSocialMovements
I found this post super valuable but I found the presentation confusing. Here is a table, provided as is, that I made based on this post & a few other sources:
Source | Amount for 2024 | Note |
---|---|---|
Open Philanthropy | $80M | Projected from past amount |
Foundation Model Taskforce | $20M | 100M GBP but unclear over how many years? |
FLI | $30M | $600M donation in crypto, say you can get $300M out of it, distributed over 10 years |
AI labs | $30M | |
Jan Tallin | $20M | See [here](https://jaan.info/philanthropy/) |
NSF | $5M | |
LTFF (not OpenPhil) | $2M | |
Nonlinear fund and donors | $1M | |
Academia | Considered separately | |
GWWC | $1M | |
Total | $189M | Does not consider uncertainty! |
You might also enjoy this review: https://nunosempere.com/blog/2023/04/28/expert-review-epoch-direct-approach/
One particularity of polymarket is that you couldn't as of the time of this market divide $1 into four shares and sell all of them for $1.09. If you could have--well, then this problem wouldn't have existed--but if you could have then this would have been a 9%.
I don't have a link off the top of my head, but the trade would have been to sell one share of yes for each market. You can do this by splitting $1 into a Yes and No share, and selling the Yes. Specifically in Polymarket you achieve this by adding and then withdrawing liquidity (for a specific type of markets called "amm', for "automatic market marker", which were the only ones supported by Polymarket at the time, though it since then also supports an order book).
By doing this, you earn $1.09 from the sale + $3 from the three events eventually, and the whole thing costs $4, so it's a guaranteed profit. So I guess that I was making a mistake when I said that there was a 9% in 1.5 months (there is a $4.09/$4, or a 2.25% return over 1.5 months, which is much worse).
The framework is AI strategy nearcasting: trying to answer key strategic questions about transformative AI, under the assumption that key events (e.g., the development of transformative AI) will happen in a world that is otherwise relatively similar to today’s.
Usage of "nearcasting" here feels pretty fake. "Nowcasting" is a thing because 538/meteorology/etc. has a track record of success in forecasting and decent feedback loops, and extrapolating those a bit seems neat.
But as used in this case, feedback loops are poor, and it just feels like a different analytical beast. So the resemblance to "forecasting" seems a bit icky, particularly if you are going to reference "nearcasting" without explanation it in subsequent posts: <https://ea.greaterwrong.com/posts/75CtdFj79sZrGpGiX/success-without-dignity-a-nearcasting-story-of-avoiding>.
I spent a bit thinking about a replacement term, and I came up with "scenario planning absent radical transformations analysis", or SPARTA for short. Not perfect, though.
See this comment: <https://www.lesswrong.com/posts/yCuzmCsE86BTu9PfA/there-are-no-coherence-theorems?commentId=v2mgDWqirqibHTmKb>
I am not defending the language of the OP's title, I am defending the content of the post.
You don't have strategic voting with probabilistic results. And the degree of strategic voting can also be mitigated.
Copying my second response from the EA forum:
Like, I feel like with the same type of argument that is made in the post I could write a post saying "there are no voting impossibility theorems" and then go ahead and argue that the Arrow's Impossibility Theorem assumptions are not universally proven, and then accuse everyone who ever talked about voting impossibility theorems that they are making "an error" since "those things are not real theorems". And I think everyone working on voting-adjacent impossibility theorems would be pretty justifiedly annoyed by this.
I think that there is some sense in which the character in your example would be right, since:
- Arrow's theorem doesn't bind approval voting.
- Generalizations of Arrow's theorem don't bind probabilistic results, e.g., each candidate is chosen with some probability corresponding to the amount of votes he gets.
Like, if you had someone saying there was "a deep core of electoral process" which means that as they scale to important decisions means that you will necessarily get "highly defective electoral processes", as illustrated in the classic example of the "dangers of the first pass the post system". Well in that case it would be reasonable to wonder whether the assumptions of the theorem bind, or whether there is some system like approval voting which is much less shitty than the theorem provers were expecting, because the assumptions don't hold.
The analogy is imperfect, though, since approval voting is a known decent system, whereas for AI systems we don't have an example friendly AI.
Copying my response from the EA forum:
(if this post is right)
The post does actually seem wrong though.
Glad that I added the caveat.
Also, the title of "there are no coherence arguments" is just straightforwardly wrong. The theorems cited are of course real theorems, they are relevant to agents acting with a certain kind of coherence, and I don't really understand the semantic argument that is happening where it's trying to say that the cited theorems aren't talking about "coherence", when like, they clearly are.
Well, part of the semantic nuance is that we don't care as much about the coherence theorems that do exist if they will fail to apply to current and future machines
IMO completeness seems quite reasonable to me and the argument here seems very weak (and I would urge the author to create an actual concrete situation that doesn't seem very dumb in which a highly intelligence, powerful and economically useful system has non-complete preferences).
Here are some scenarios:
- Our highly intelligent system notices that to have complete preferences over all trades would be too computationally expensive, and thus is willing to accept some, even a large degree of incompleteness.
- The highly intelligent system learns to mimic the values of human, which end up having non-complete preferences, which the agent mimics
- You train a powerful system to do some stuff, but also to detect when it is out of distribution and in that case do nothing. Assuming you can do that, their preference is incomplete, since when offered tradeoffs they always take the default option when out of distribution.
The whole section at the end feels very confused to me. The author asserts that there is "an error" where people assert that "there are coherence theorems", but man, that just seems like such a weird thing to argue for. Of course there are theorems that are relevant to the question of agent coherence, all of these seem really quite relevant. They might not prove the things in-practice, as many theorems tend to do.
Mmh, then it would be good to differentiate between:
- There are coherence theorems that talk about some agents with some properties
- There are coherence theorems that prove that AI systems as will soon exist in the future will be optimizing utility functions
You could also say a third thing, which would be: there are coherence theorems that strongly hint that AI systems as will soon exist in the future will be optimizing utility functions. They don't prove it, but they make it highly probable because of such and such. In which case having more detail on the such and such would deflate most of the arguments in this post, for me.
For instance:
“‘Coherence arguments’ mean that if you don’t maximize ‘expected utility’ (EU)—that is, if you don’t make every choice in accordance with what gets the highest average score, given consistent preferability scores that you assign to all outcomes—then you will make strictly worse choices by your own lights than if you followed some alternate EU-maximizing strategy (at least in some situations, though they may not arise). For instance, you’ll be vulnerable to ‘money-pumping’—being predictably parted from your money for nothing.
This is just false, because it is not taking into account the cost of doing expected value maximization, since giving consistent preferability scores is just very expensive and hard to do reliably. Like, when I poll people for their preferability scores, they give inconsistent estimates. Instead, they could be doing some expected utility maximization, but the evaluation steps are so expensive that I now basically don't bother to do some more hardcore approximation of expected value for individuals, but for large projects and organizations. And even then, I'm still taking shortcuts and monkey-patches, and not doing pure expected value maximization.
“This post gets somewhat technical and mathematical, but the point can be summarised as:
- You are vulnerable to money pumps only to the extent to which you deviate from the von Neumann-Morgenstern axioms of expected utility.
In other words, using alternate decision theories is bad for your wealth.”
The "in other words" doesn't follow, since EV maximization can be more expensive than the shortcuts.
Then there are other parts that give the strong impression that this expected value maximization will be binding in practice:
“Rephrasing again: we have a wide variety of mathematical theorems all spotlighting, from different angles, the fact that a plan lacking in clumsiness, is possessing of coherence.”
“The overall message here is that there is a set of qualitative behaviors and as long you do not engage in these qualitatively destructive behaviors, you will be behaving as if you have a utility function.”
“The view that utility maximizers are inevitable is supported by a number of coherence theories developed early on in game theory which show that any agent without a consistent utility function is exploitable in some sense.”
Here are some words I wrote that don't quite sit right but which I thought I'd still share: Like, part of the MIRI beat as I understand it is to hold that there is some shining guiding light, some deep nature of intelligence that models will instantiate and make them highly dangerous. But it's not clear to me whether you will in fact get models that instantiate that shining light. Like, you could imagine an alternative view of intelligence where it's just useful monkey patches all the way down, and as we train more powerful models, they get more of the monkey patches, but without the fundamentals. The view in between would be that there are some monkey patches, and there are some deep generalizations, but then I want to know whether the coherence systems will bind to those kinds of agents.
No need to respond/deeply engage, but I'd appreciate if you let me know if the above comments were too nitpicky.
I am also curious about the extent to which you are taking the Hoffman scaling laws as an assumption, rather than as something you can assign uncertainty over.
I thought this was great, cheers.
Here:
Next, we estimate a sufficient horizon length, which I'll call the k-horizon, over which we expect the most complex reasoning to emerge during the transformative task. For the case of scientific research, we might reasonably take the k-horizon to roughly be the length of an average scientific paper, which is likely between 3,000 and 10,000 words. However, we can also explicitly model our uncertainty about the right choice for this parameter.
It's unclear whether the final paper would be the needed horizon length.
For analogous reasoning, consider a model trained to produce equations which faithfully describe reality. These equations tend to be quite short. But I imagine that the horizon length needed to produce them is larger, because you have to keep many things in mind when doing so. Unclear if I'm anthropomorphizing here.
But I think it is >30% likely you can compensate for past over or under estimations.
I'd bet against that at 1:5, i.e., against the proposition that the optimal forecast is not subject to your previous history
This is true in the abstract, but the physical word seems to be such that difficult computations are done for free in the physical substrate (e.g,. when you throw a ball, this seems to happen instantaneously, rather than having to wait for a lengthy derivation of the path it traces). This suggests a correct bias in favor of low-complexity theories regardless of their computational cost, at least in physics.
Neat. I have some uncertainty about the evolutionary estimates you are relying on, per here. But neat.
Thanks Tamay!
Seems like this assumes an actual superintelligence, rather than near-term scarily capable successor of current ML systems.
Why publish this publicly? Seems like it would improve optimality of training runs?
Software: archivenow
Need: Archiving websites to the internet archive.
Other programs I've tried: The archive.org website, spn, various scripts, various extensions.
archivenow is trusty enough for my use case, and it feels like it fails less often than other alternatives. It was also easy enough to wrap into a bash script and process markdown files. Spn is newer and has parallelism, but I'm not as familiar with it and subjectively it feels like it fails a bit more.
See also: Gwern's setup.
Have prediction markets which pay $100 per share, but only pay out 1% of the time, chosen randomly. If the 1% case that happens, then also implement the policy under consideration.
Have prediction markets which pay $100 per share, but only pay out 1% of the time, chosen randomly. If the 1% case that happens, then also implement the policy under consideration.
The issue is that probabilities for something that will either happen or not don't really make sense in a literal way
This is just wrong/frequentist. Search for the "Bayesian" view of probability.
I thought this post was great; thanks for writing it.
Will SMTM answer NCM's post criticizing their Lithium theory? <https://manifold.markets/NuñoSempere/will-smtm-answer-ncms-post-criticiz>
https://metaforecast.org/?query=ETH+merge -> https://polymarket.com/market-group/ethereum-merge-pos -> 59% by October, 87% by November.
The Litany of Might
I strive to take whatever steps may help me best to reach my goals,
I strive to be the very best at what I strive
There is no glory in bygone hopes,
There is no shame in aiming for the win,
there is no choice besides my very best,
to play my top moves and disregard the rest
That as well.
I was assigning less than 3% probability to ~plagiarism being the case, mostly based on Isusr not mentioning that at all in the original post + people seeing similarities where there are none. But seems that I was wrong.
Curious if you know where those people come from?
Sure, see here: https://imgur.com/a/pMR7Qw4
I'm not sure to what extent there's a "forecasting scene", or who is part of it.
There is a forecasting scene, made out of hobbyist forecasters and more hardcore prediction market players, and a bunch of researchers. The best prediction market people tend to have fairly sharp models of the world, particularly around elections. They also have a pretty high willingness to bet.
I've become a bit discouraged by the lack of positive reception for my forecasting newsletter on LessWrong, to which I've been publishing it since April 2020. For example, I thought that Forecasting Newsletter: Looking back at 2021 was excellent. It was very favorably reviewed by Scott Alexander here. I poured a bunch of myself into that newsletter. It got 18 karma.
I haven't bothered crossposting it to LW this month, but it continues in substack and on the EA forum.
This was hillarious, very fun to read.
Whoops, changed
Odds are an alternative way of presenting probabilities. 50% corresponds to 1:1, 66.66..% corresponds to 1:2, 90% corresponds to 1:9, etc. 33.33..% correspond to 2:1 odds, or, with the first number as as a 1, 1:0.5 odds.
Log odds, or bits, are the logarithm of probabilities expressed as 1:x odds. In some cases, they can be a more natural way of thinking about probabilities (see e.g., here.)
Couldn't they just get lower interest rate loans elsewhere?
This doesn't mean necessarily that you shouldn't take the bet, but maybe that you should also take the loan.
Thanks Jackson!