Posts
Comments
Oh yeah. How do I know I'm angry? My back is stiff and starts to hurt.
The second reason that I don't trust the neighbor method is that people just... aren't good at knowing who a majority of their neighbors are voting for. In many cases it's obvious (if over 70% of your neighbors support one candidate or the other, you'll probably know). But if it's 55-45, you probably don't know which direction it's 55-45 in.
My guess is that there's some postprocessing here. E.g. if you assume that the "neighbor" estimate is wrong but without the refusal problem, and you have the same data from the previous election, then you could estimate the shift of opinions and apply that to other pools that ask about your vote. Or you could ask some additional question like "who did your neighbours vote for in the previous election" and compare that to the real data (ideally per county or so). I would be very surprised if they based the bets just on the raw results.
My initial answer to your starting question was "I disagree with this statement because they likely used 20 different ways of calculating the p-value and selected the one that was statistically significant". Also https://en.m.wikipedia.org/wiki/Replication_crisis
I don't think there's a lot of value in distinguishing 3000 and 1,000,000 and probably for any aggregate you'll want to show this will just be "later than 2200" or something like that. But yes this way they can't make a statement that this will be 1,000,000 which is some downside.
I'm not a big fan of looking at the neighbors to decide whether this is a missing answer or high estimate (it's OK to not want to answer this one question). So some N/A or -1 should be ok.
(Just to make it clear, I'm not saying this is an important problem)
I think you can also ask them to put some arbitrarily large number (3000 or 10000 or so) and then just filter out all the numbers above some threshold.
By what year do you think the Singularity will occur? Answer such that you think, conditional on the Singularity occurring, there is an even chance of the Singularity falling before or after this year. If you think a singularity is so unlikely you don't even want to condition on it, leave this question blank
You won't be able to distinguish people who think singularity is super unlikely from people who just didn't want to answer this question for whatever reason (maybe they forgot or thought it's boring or whatever).
Anyway, in such a world some people would probably evolve music that is much more interesting to the public
I wouldn't be so sure.
I think the current diversity of music is largely caused by artists' different lived experiences. You feel something, this is important for you, you try to express that via music. As long as AIs don't have anything like "unique experiences" on the scale of humans, I'm not sure if they'll be able to create music that is that diverse (and thus interesting).
I assume the scenario you described, not a personal AI trained on all your life. With that, it could work.
(Note that I mostly think about small bands, not popular-music-optimised-for-wide-publicity).
Are there other arguments for active skepticism about Multipolar value fragility? I don’t have a ton of great stories
The story looks for me roughly like this:
- The agents won't have random values - they will be somehow scattered around the current status quo.
- Therefore the future we'll end up in should not be that far from the status quo.
- Current world is pretty good.
(I'm not saying I'm sure this is correct, it's just roughly how I think about that)
This is interesting! Although I think it's pretty hard to use that in a benchmark (because you need a set of problems assigned to clearly defined types and I'm not aware of any such dataset).
There are some papers on "do models know what they know", e.g. https://arxiv.org/abs/2401.13275 or https://arxiv.org/pdf/2401.17882.
A shame Sam didn't read this:
But if you are running on corrupted hardware, then the reflective observation that it seems like a righteous and altruistic act to seize power for yourself—this seeming may not be be much evidence for the proposition that seizing power is in fact the action that will most benefit the tribe.
Thanks! Indeed, shard theory fits here pretty well. I didn't think about that while writing the post.
Very good post! I agree with most of what you have written, but I'm not sure about the conclusions. Two main reasons:
-
I'm not sure if mech interp should be compared to astronomy, I'd say it is more like mechanical engineering. We have JWST because long long time ago there were watchmakers, gunsmiths, opticans etc who didn't care at all about astronomy, yet their advances in unrelated fields made astronomy possible. I think something similar might happen with mech interp - we'll keep creating better and better tools to achieve some goals, these goals will in the end turn up useless from the alignment point of view, but the tools will not.
-
Many people think mech interp is cool and fun. I'm personally not a big fan, but I think it is much more interesting than e.g. governance. If our only perspective is AI safety, this shouldn't matter - but people have many perspectives. There might not really be a choice between "this bunch of junior researches doing mech interp vs this bunch of junior researchers doing something more useful", they would just go do something not related to alignment instead. My guess is that attractiveness of mech interp is the strongest factor for its popularity.
I don't think this answer is in any way related to my question.
This is my fault, because I didn't explain what I exactly mean by the "simulation", and the meaning is different than the most popular one. Details in EDIT in the main post.
I think EU countries might be calculating something like this: A) go on with AZ --> people keep talking about killer vaccines and how you should never trust the government and that no sane person should vaccinate and "blood clots today, what tomorrow?" B) halt AZ, then say "we checked carefully, everything's fine, we care, we don't want to kill anyone with our vaccine" and start again --> people will trust the vaccines just-a-little-more
And in the long term the general trust in the vaccines is much more important than few weeks delay.
I think you assume that scenario A is also better for the vaccine trust - maybe, I don't know, but I wouldn't be surprised if the European governments were seeing this the other way.
Also, obviously the best solution is "hey people, let's just stop talking about the goddamned blood clots", but The Virtue of Silence (https://www.lesswrong.com/posts/2brqzQWfmNx5Agdrx/the-virtue-of-silence) is not popular enough : )
A simple way of rating the scenarios above is to describe them as you have and ask humans what they think.
Do you think this is worth doing?
I thought that
- either this was done a billion times and I just missed it
- or this is neither important nor interesting to anyone but me
What's wrong with the AI making life into a RPG (or multiple thereof)? People like stories and they like levelling up, collecting stuff, crafting, competing, etc. A story doesn't have to be pure fun (and those sort of stories are boring anyway).
E.g. Eliezer seems to think it's not the perfect future: "The presence or absence of an external puppet master can affect my valuation of an otherwise fixed outcome. Even if people wouldn't know they were being manipulated, it would matter to my judgment of how well humanity had done with its future. This is an important ethical issue, if you're dealing with agents powerful enough to helpfully tweak people's futures without their knowledge".
Also, you write:
If we want to have a shot at creating a truly enduring culture (of the kind that is needed to get us off this planet and out into the galaxy)
If we really want this, we have to restrain from spending our whole lives playing the best RPG possible.
Never mind AI, they're contradictory when executed by us. We aren't robots following a prioritised script and an AI wouldn't be either.
Consider human rules "you are allowed to lie to someone for the sake of their own utility" and "everyone should be able to take control of their own life". We know that lies about serious things never turn out good, so we lie only about things of little importance, and little lies like "yes grandma, that was very tasty" doesn't contradict the second rule. This looks different when you are an ultimate deceiver.
Your TitForTatBot
* never sets self.previous
* even if it was set, it would stop cooperating when opponent played 0
Also I agree with Zvi's comment, why 2.5 for free? This way one should really concentrate on maxing out in the early stage, is it intended?