Oh, yeah, I totally think what happened here is "we had more rules/guidelines about COVID, which increased the complexity of the rules we had to follow, which caused us to be more inconsistent in applying those rules". I didn't mean to imply that we actually flawlessly followed the rules.
(Please don't leave both top-level reacts and inline reacts of the same type on comments, that produces somewhat clearly confusing summary statistics. We might make it literally impossible, but until then, pick one and stick to it)
Actually ok now that I am thinking, why don't downvoters have to select the text and provide the negative feedback in order to issue a downvote?
Forcing people to write a whole sentence or multiple paragraphs to signal that they think some content is bad would of course have enormous chilling effects on people's ability to express their preferences over content on the site, and reduce the signal we have on content-quality a lot.
Downvoters never reply. I suspect because they are obviously afraid I will retaliate their downvotes with my own...
I would be quite surprised if it's about vote-retaliation. I think it's usually because then people ask follow-up questions and there is usually an asymmetric burden of proof in public communication where interlocutors demand very high levels of precision and shareable evidence, when the actual underlying cognitive process was "my gut says this is bad, and I don't want to see more of this".
That's a nice to have, and I do think it reduces the correlation across time and so is a case for having the rate-limit decay with just time, but mostly the point of the rate-limit is to increase the average comment quality on the site without banning a bunch of people (which comes with much more chilling effects where their perspectives are not at all represented on the site, and while still allowing them to complain about the moderation and make the costs to them known)
For COVID in-particular we added a specific threshold that is "yes, this is news based, but important enough that we will frontpage the most important posts in this category anyways". I think we announced it somewhere, let me look it up...
No, it's if at least 7 people downvote you in the past 20 comments (on comments that end up net-negative), and the net of all the votes (ignoring your self-votes) on your last 20 comments is below -5 (just using approval-karma, not agreement-karma).
Yeah, it's not crazy, but I currently am against it. I think if a user only comments occasionally, but always comments in a way that gets downvoted, then I think it's good for them to maintain a low rate-limit. I don't see how calendar time passing gives me evidence that someone's comments will be better and that I now want more of them on the site again.
Hmm, I feel sad about this kind of critique. Like, this comment invokes some very implicit standard for posts, without making it at all explicit. Of course neither this post nor the posts they link to are literally "not based on anything". My guess is you are invoking an implicit standard for work to be "empirical" in order to be "anything", but that also doesn't really make sense since there are a lot of empirical arguments in this article and in the linked articles.
I think highlighting any specific assumption, or even some set of assumptions that you think is fragile would be helpful. Or being at all concrete about what you would consider work that is "anything". But I think as it stands I find it hard to get much out of comments like this.
It's net karma of your last 20 comments or posts. So in order for one person to rate limit you, you would have needed to write 20 comments in a row that got basically no votes from anyone but you, at which point, I probably endorse rate-limiting you (though the zero vote case is a bit tricky, and indeed where I think a lot of the false-positives and false-negatives of the system come from).
I do think the system tends to fire the most false-positives when people are engaging in really in-depth comment trees and so write a lot of comments that get no engagement, which then makes things more sensitive to marginal downvotes. I do think "number of downvoters in the last month" or maybe "number of downvoters on your last 20 comments or posts" would help a bunch with that.
In order for a rate limit to trigger the user needs to be downvoted by at least 4 different users for users below 2000 karma, and 7 different users for users above 2000 karma (relevant line of code is here).
This failsafe I think prevents most occasional commenters and posters from being affected by one or two people downvoting them.
I do think it fails to trigger for Roko here, since I think we only check for "total downvoter count", which helps with new users, but of course over the hundreds of comments that Roko has acquired over the years he has acquired more than 7 downvoters. I think replacing that failsafe with "downvoters in the last month" is a marginal improvement, and I might make a PR with that.
Promoted to curated: I think it's pretty likely a huge fraction of the value of the future will be determined by the question this post is trying to answer, which is how much game theory produces natural solutions to coordination problems, or more generally how much better we should expect systems to get at coordination as they get smarter.
I don't think I agree with everything in the post, and a few of the characterizations of updatelessness seem a bit off to me (which Eliezer points to a bit in his comment), but I still overall found reading this post quite interesting and valuable for helping me think about for which of the problems of coordination we have a more mechanistic understanding of how being smarter and better at game theory might help, and which ones we don't have good mechanisms for, which IMO is a quite important question.
"Endpoints are easier to predict than intermediate trajectories" seems like a locally valid and relevant point to bring up. Then there is a valid argument here that there are lots of reasons people want to build powerful AGI, and that the argument about the structure of the cognition here is intended to apply to an endpoint where those goals are achieved, which is a valid response (if not a knockdown argument) to the argument of the interlocutor that is reasoning from local observations and trends.
Maybe you were actually commenting on some earlier section, but I don't see any word games in the section you quoted.
I think the key issue here is that CFAR workshops were optimized around being 4 days long. I think teaching someone numeracy in 4 days is very hard, and the kind of things you end up being able to convey look different (and still pretty valuable, but I do think end up missing a large fraction of the art of the art of rationality).
Hmm, I don't buy it. These two scenes seem very much not like the kind of thing a video game engine could produce:
Look at this frame! I think there is something very slightly off about that face, but the cat hitting the person's face and the person's reaction seem very realistic to me and IMO qualifies as "complex motion and photorealism in the same video".
It also isn't my favorite version of this post that could exist, but it seems like a reasonable point to make, and my guess is a lot of people are expressing their agreement with the title by upvoting.
I think currently the bot is more noticeable than where it will when we have cleared out the 2023/2024 backlog. Usually the bot just makes a comment on a post when it reaches 100 karma, but since we are just starting it, it's leaving a lot of comments at the same time whenever older posts get voted on that don't yet have a market.
The key UI component I care about is actually not the comment (which was just the most natural place to put this information), but the way the post shows up in post-lists:
The karma number gets a slightly different (golden-ish) color, and then you can see the likelihood that it ends up at the top of the review on hover as well as at the top of the post.
The central goal is to both allows us to pull forward a bunch of the benefits of the review, and to create a more natural integration of the review into the everyday experience of the site.
Yeah, not being able to say "negative reward"/"punishment" when you use "reinforcement" seems very costly. I've run into that problem a bunch.
And yeah, that makes sense. I get the "reward implies more model based-thinking" part. I kind of like that distinction, so am tentatively in-favor of using "reward" for more model-based stuff, and "reinforcement" for more policy-gradient based stuff, if other considerations don't outweigh that.
I don't understand why "reinforcement" is better than "reward"? They both invoke the same image to me.
If you reward someone for a task, they might or might not end up reliably wanting to do the task. Same if you "reinforce" them to do that task. "Reinforce" is more abstract, which seems generally worse for communication, so I would mildly encourage people to use "reward function", but mostly expect other context cues to determine which one is better and don't have a strong general take.
Promoted to curated: I like this post as a relatively self-contained explanation for why AI Alignment is hard. It's not perfect, in that I do think it makes a bunch of inferences implicitly and without calling sufficient attention to them, but I still think overall this seems to me like one of the best things to link to when someone asks about why AI Alignment is an open problem.
Quantum physics. I don't see why it would be indistinguishable from 50%.
Agree that there will be some decoherence. My guess is decoherence would mostly leave particle position at this scale intact, and if it becomes a huge factor, I would want the question to be settled on the basis being able to predict which side has higher irreducible uncertainty (i.e. which side had higher amplitude, if I am using that concept correctly).
I do think that helps, but I don't think it helps that much. People don't pursue super naive CDT-ish decision theories.
In-practice this shakes out in a feeling of being indebted to whoever pays you and a pretty strong hesitation to do something that would upset them, even if they weren't going to pay you more anyways. Also, few games are actually really only single-iteration. You will likely continue interacting in one way or another, and Arb will interact with other clients, making this have more of an iterated nature.
My guess is it matters a lot, even if people aspire towards independence. I would update if someone has a long track record of clearly neutral-seeming reports for financial compensation, but I think in the absence of such a track record, my prior would be that people are very rarely capable of making strong negative public statements about people who are paying them.
On a more meta point, I have honestly not been all that impressed with the average competency of the AIS funding ecosystem. I don't think it not funding a project is particularly strong evidence that the project is a bad idea.
I made a different call on AISC, but also think this is right. There aren't a lot of players in the funding ecosystem, especially post-FTX there isn't a lot of non-OpenPhil money around, and I generally only weakly update on people succeeding to get funding or failing to get funding.
This is a relatively straightforward question in the context of quantum mechanics. There is a fact of the matter of how much amplitude the world states get where one person wins an election vs. the other one. This question is about how much such decoherence there will be.
In this conception of uncertainty there is no answer to the matter of which of the two outcomes really happens. Both events get some magical reality fluid, as Eliezer would call it.
The original title of this post is "RAND doesn't believe current LLMs are helpful for bioweapons development". I don't think it makes sense to ascribe beliefs this specific to an entity as messy and big as RAND. I changed title to something that tries to be informative without making as strong of a presumption (for link posts to posts by off-site authors I take more ownership over how a post is titled, I wouldn't change the title if the author of the report had created it)
Promoted to curated: I disagree with a bunch of the approach outlined in this post, but I nevertheless found this framing quite helpful for thinking about various AI X-risk related outcomes and plans. I also really appreciate the way this post is written, being overall both approachable while maintaining relatively much precision in talking about these issues.
The bet would then be over the integral of all the random initializations (and random perturbations). I.e. does a random initializations in-expectation leave enough information intact for 20 seconds if you change it a tiny bit.
Nah, I don't think that's super relevant here. All the degrees of freedom of the gas are coupled to each other, so the biggest source of chaos can scramble everything just fine.
Hmm, I don't super buy this. For example, this model predicts no standing wave would survive for multiple seconds, but this is trivial to disprove by experiment. So clearly there are degrees of freedom that remain coupled. No waves of substantial magnitude are present in the initialization here, but your argument clearly implies a decay rate for any kind of wave that is too substantial.
I don't think most people die for quantum-randomness reasons. I expect very little probability of someone dying is related to quantum randomness (though my guess is someone might disagree, but then we are just kind of back to the OP question about how much quantum randomness influences macro-level events).
I thought some about the AI Safety camp for the LTFF. I mostly evaluated the research leads they listed and the resulting teams directly, for the upcoming program (which was I think the virtual one in 2023).
I felt unexcited about almost all the research directions and research leads, and the camp seemed like it was aspiring to be more focused on the research lead structure than past camps, which increased the weight I was assigning to my evaluation of those research directions. I considered for a while to fund just the small fraction of research lead teams I was excited about, but it was only a quite small fraction, and so recommended against funding it.
It did seem to me that the quality of research leads was very marketly worse by my lights than past years, so I didn't feel comfortable just doing an outside-view on the impact of past camps (as the ARB report seems to do). I feel pretty good about the past LTFF grants to the past camps but my expectations for post-2021 camps were substantially worse than earlier camps, looking at the inputs and plans, so my expectation of the value of it substantially changed.
Doing it for one particle seems like it would be harder than doing it for all particles, since even if you are highly uncertain about each individual particle, in-aggregate that could still produce a quite high confidence about which side has more particles. So my guess is it matters a lot whether it's almost uniform or not.
Do you know how to interpret "maximum divergence" in this context? Also, IIRC aren't there higher-order exponents that might decay slower? (I just read about this this morning, so I am quite unfamiliar with the literature here)
I do think even if you change the outcome of all people using quantum random number generators, this is quite unlikely to flip the outcome of an election. It's just not that many people, and election margins are quite large. There are butterfly effects here, but I think the prior on the people who use quantum random number generators explaining a lot of the variance of election outcomes seems quite unlikely to me, even if you can correlate their actions somehow.
Oh, I was definitely not thinking of a hole in a gas pipe. I was expecting something much much subtler than that (more like very highly localized temperature-increases which then chain-react). You are dealing with omniscient levels of consequence-control here.
The goal is not to predict the ratio, but to just predict which side will have more atoms (no matter how small the margin). It seems very likely to me that any such calculation would be extremely prohibitively expensive and would approximately require logical omniscience.
To clarify this, we are assuming that without random perturbation, you would get 100% accuracy in predicting which side of the system has more atoms at t=20s. The question is how much of that 100% accuracy you can recover with a very very small unknown perturbation.
The variance in density will by-default be very low, so the effect size of such structure really doesn't have to be very high. Also, if you can identify multiple such structures which are uncorrelated, you can quickly bootstrap to relatively high confidence.
I don't think "strong correlation" is required. I think you just need a few independent pieces of evidence. Determining such independence is usually really hard to establish, but we are dealing with logical omniscience here.
For example, any set of remotely coherent waves that form in the box with non-negligible magnitude would probably be enough to make a confident prediction. I do think that specific thing is kind of unlikely in a totally randomly initialized box of gas, but I am not confident, and there are many other wave-like patterns that you would find.
Does this undermine the parable? Kinda, I think. If you built a machine that samples from some bizarre inhuman distribution, and then you get bizarre outcomes, then the problem is not really about your wish any more, the problem is that you built a weirdly-sampling machine. (And then we can debate about the extent to which NNs are weirdly-sampling machines, I guess.)
This is roughly how I would interpret the post. Physics itself is a bizarre inhuman distribution, and in-general many probability distributions from which you might want to sample from will be bizarre and inhuman.
Agree that it's then arguable to what degree the optimization pressure of a mature AGI arising from NNs would also be bizarre. My guess is quite bizarre, since a lot of the constraints it will face will be constraints of physics.