Comment by paulfchristiano on The AI Timelines Scam · 2019-07-12T04:32:14.737Z · score: 15 (5 votes) · LW · GW
See Marcus's medium article for more details on how he's been criticized

Skimming that post it seems like he mentions two other incidents (beyond the thread you mention).

Gary Marcus: @Ylecun Now that you have joined the symbol-manipulating club, I challenge you to read my arxiv article Deep Learning: Critical Appraisal carefully and tell me what I actually say there that you disagree with. It might be a lot less than you think.
Yann LeCun: Now that you have joined the gradient-based (deep) learning camp, I challenge you to stop making a career of criticizing it without proposing practical alternatives.
Yann LeCun: Obviously, the ability to criticize is not contingent on proposing alternatives. However, the ability to get credit for a solution to a problem is contingent on proposing a solution to the problem.
Gary Marcus: Folks, let’s stop pretending that the problem of object recognition is solved. Deep learning is part of the solution, but we are obviously still missing something important. Terrific new examples of how much is still be solved here: #AIisHarderThanYouThink
Critic: Nobody is pretending it is solved. However, some people are claiming that people are pretending it is solved. Name me one researcher who is pretending?
Gary Marcus: Go back to Lecun, Bengio and Hinton’s 9 page Nature paper in 2015 and show me one hint there that this kind of error was possible. Or recall initial dismissive reaction to https://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Nguyen_Deep_Neural_Networks_2015_CVPR_paper.pdf …
Yann LeCun: Yeah, obviously we "pretend" that image recognition is solved, which is why we have a huge team at Facebook "pretending" to work on image recognition. Also why 6500 people "pretended" to attend CVPR 2018.

The most relevant quote from the Nature paper he is criticizing (he's right that it doesn't discuss methods working poorly off distribution):

Unsupervised learning had a catalytic effect in reviving interest in deep learning, but has since been overshadowed by the successes of purely supervised learning. Although we have not focused on it in this Review, we expect unsupervised learning to become far more important in the longer term. Human and animal learning is largely unsupervised: we discover the structure of the world by observing it, not by being told the name of every object.
Human vision is an active process that sequentially samples the optic array in an intelligent, task-specific way using a small, high-resolution fovea with a large, low-resolution surround. We expect much of the future progress in vision to come from systems that are trained end-toend and combine ConvNets with RNNs that use reinforcement learning to decide where to look. Systems combining deep learning and reinforcement learning are in their infancy, but they already outperform passive vision systems at classification tasks and produce impressive results in learning to play many different video games.
Natural language understanding is another area in which deep learning is poised to make a large impact over the next few years. We expect systems that use RNNs to understand sentences or whole documents will become much better when they learn strategies for selectively attending to one part at a time.
Ultimately, major progress in artificial intelligence will come about through systems that combine representation learning with complex reasoning. Although deep learning and simple reasoning have been used for speech and handwriting recognition for a long time, new paradigms are needed to replace rule-based manipulation of symbolic expressions by operations on large vectors
Comment by paulfchristiano on 87,000 Hours or: Thoughts on Home Ownership · 2019-07-11T20:02:39.369Z · score: 4 (2 votes) · LW · GW

Housing markets move because they depend on the expectation of future rents. If I want to expose myself to future rents, I have to take on volatility in the expectation of future rents, that's how the game goes.

Comment by paulfchristiano on The AI Timelines Scam · 2019-07-11T18:31:35.511Z · score: 55 (15 votes) · LW · GW
in part because I don't have much to say on this issue that Gary Marcus hasn't already said.

It would be interesting to know which particular arguments made by Gary Marcus you agree with, and how you think they relate to arguments about timelines.

In this preliminary doc, it seems like most the disagreement is driven by saying there is a 99% probability that training a human-level AI would take more than 10,000x more lifetimes than AlphaZero took games of go (while I'd be at more like 50%, and have maybe 5-10% chance that it will take many fewer lifetimes). Section 2.0.2 admits this is mostly guesswork, but ends up very confident the number isn't small. It's not clear where that particular number comes from, the only evidence gestured at is "the input is a lot bigger, so it will take a lot more lifetimes" which doesn't seem to agree with our experience so far or have much conceptual justification. (I guess the point is that the space of functions is much bigger? but if comparing the size of the space of functions, why not directly count parameters?) And why is this a lower bound?

Overall this seems like a place you disagree confidently with many people who entertain shorter timelines, and it seems unrelated to anything Gary Marcus says.

Comment by paulfchristiano on The AI Timelines Scam · 2019-07-11T08:42:48.090Z · score: 134 (43 votes) · LW · GW

I agree with:

• Most people trying to figure out what's true should be mostly trying to develop views on the basis of public information and not giving too much weight to supposed secret information.
• It's good to react skeptically to someone claiming "we have secret information implying that what we are doing is super important."
• Understanding the sociopolitical situation seems like a worthwhile step in informing views about AI.
• It would be wild if 73% of tech executives thought AGI would be developed in the next 10 years. (And independent of the truth of that claim, people do have a lot of wild views about automation.)

I disagree with:

• Norms of discourse in the broader community are significantly biased towards short timelines. The actual evidence in this post seems thin and cherry-picked. I think the best evidence is the a priori argument "you'd expect to be biased towards short timelines given that it makes our work seem more important." I think that's good as far as it goes but the conclusion is overstated here.
• "Whistleblowers" about long timelines are ostracized or discredited. Again, the evidence in your post seems thin and cherry-picked, and your contemporary example seems wrong to me (I commented separately). It seems like most people complaining about deep learning or short timelines have a good time in the AI community, and people with the "AGI in 20 years" view are regarded much more poorly within academia and most parts of industry. This could be about different fora and communities being in different equilibria, but I'm not really sure how that's compatible with "ostracizing." (It feels like you are probably mistaken about the tenor of discussions in the AI community.)
• That 73% of tech executives thought AGI would be developed in the next 10 years. Willing to bet against the quoted survey: the white paper is thin on details and leaves lots of wiggle room for chicanery, while the project seems thoroughly optimized to make AI seem like a big deal soon. The claim also just doesn't seem to match my experience with anyone who might be called tech executives (though I don't know how they constructed the group).
Comment by paulfchristiano on The AI Timelines Scam · 2019-07-11T07:50:18.378Z · score: 55 (21 votes) · LW · GW

For reference, the Gary Marcus tweet in question is:

“I’m not saying I want to forget deep learning... But we need to be able to extend it to do things like reasoning, learning causality, and exploring the world .” - Yoshua Bengio, not unlike what I have been saying since 2012 in The New Yorker.

I think Zack Lipton objected to this tweet because it appears to be trying to claim priority. (You might have thought it's ambiguous whether he's claiming priority, but he clarifies in the thread: "But I did say this stuff first, in 2001, 2012 etc?") The tweet and his writings more generally imply that people in the field have recently changed their view to agree with him, but many people in the field object strongly to this characterization.

The tweet is mostly just saying "I told you so." That seems like a fine time for people to criticize him about making a land grab rather than engaging on the object level, since the tweet doesn't have much object-level content. For example:

"Saying it louder ≠ saying it first. You can't claim credit for differentiating between reasoning and pattern recognition." [...] is essentially a claim that everybody knows that deep learning can't do reasoning. But, this is essentially admitting that Marcus is correct, while still criticizing him for saying it.

Hopefully Zack's argument makes more sense if you view it as a response to Gary Marcus claiming priority. Which is what Gary Marcus was doing and clearly what Zack is responding to. This is not a substitute for engagement on the object level. Saying "someone else, and in fact many people in the relevant scientific field, already understood this point" is an excellent response to someone who's trying to claim credit for the point.

There are reasonable points to make about social epistemology here, but I think you're overclaiming about the treatment of critics, and that this thread in particular is a bad example to point to. It also seems like you may be mistaken about some of the context. (Zack Lipton has no love for short-timelines-pushers and isn't shy about it. He's annoyed at Gary Marcus for making bad arguments and claiming unwarranted credit, which really is independent of whether some related claims are true.)

Comment by paulfchristiano on 87,000 Hours or: Thoughts on Home Ownership · 2019-07-08T22:03:05.446Z · score: 8 (4 votes) · LW · GW

I'm not really making a claim about momentum, I'm just skeptical of your basic analysis.

Real 30-year interest rates are ~1%, taxes are ~1%, and I think maintenance averages ~1%. So that's ~3%/year total cost, which seems comparable to rent in areas like SF.

On top of that I think historical appreciation is around 1% (we should expect it to be somewhere between "no growth" and "land stays a constant fraction of GDP"). So that looks like buying should ballpark 10-30% cheaper if you ignore all the transaction costs, presumably because rent prices are factoring in a bunch of frictions. That sounds plausible enough to me, but in reality I expect this is a complicated mess that you can't easily sort out in a short blog post and varies from area to area.

If you want to argue for "buying is usually a terrible idea, investors are idiots or speculators" I think you should be getting into the actual numbers.

Comment by paulfchristiano on 87,000 Hours or: Thoughts on Home Ownership · 2019-07-08T21:54:44.365Z · score: 4 (2 votes) · LW · GW
I'm claiming that with covariance data such a thing could be constructed.

I'll bet against.

Comment by paulfchristiano on 87,000 Hours or: Thoughts on Home Ownership · 2019-07-07T16:43:31.243Z · score: 2 (1 votes) · LW · GW
I meant that you can get a better deal. You can get something that is only marginally more correlated but much much cheaper in the market.

What's the alternative?

Comment by paulfchristiano on 87,000 Hours or: Thoughts on Home Ownership · 2019-07-06T20:26:46.000Z · score: 8 (4 votes) · LW · GW
This assumes rational markets. It's like buying a bar because you assume bar owners are making money, or buying an airline because you assume airline owners are making money. [...] that doesn't matter if you're playing against idiots with incoherent time preferences in popular markets. You'll need to outbid someone making a poor financial decision.

If you want to convince the reader "it turns out most investors are getting a bad deal, and are only doing it because they are idiots" then I think the burden of proof is switched, now you are the one claiming that the reader should be convinced to disagree with investors who've thought about it a lot more.

(I also don't think you are right on the object level, but it's a bit hard to say without seeing the analysis spelled out. There's no way that nominal rather than real interest rates are the important thing for decision-making in this case---if we just increase inflation and mortgage rates and rent increase caps by 5%, that should clearly have no impact on the buy vs. rent decision. You can convert the appreciation into more cashflow if you want to.)

Why do landlords do it if they aren't building meaningful equity at super high rent to buy ratios? Leveraged speculation (aka high stakes gambling).

The basic argument in favor is: if I want to keep living in the area, then I'm going to have to pay if the rent goes up. Not buying is more of a gamble than buying: if you buy you know that you are just going to keep making mortgage payments + taxes + maintenance, while if you don't buy you have no idea what your expenses are going to be in twenty year's time. (But I totally agree that buying a house depends on knowing where you want to live for a reasonably long time. I also agree that e.g. when housing prices go up your wages are likely to go up, so you don't want to totally hedge this out.)

the optimal hedge is whatever grows your money the fastest in a way that isn't correlated with your other cash flows + assets.

The optimal hedge is anticorrelated with your other cashflows, not uncorrelated. In this case, future rent is one of your biggest cashflows.

A portfolio that is better hedged against local markets and less volatile than housing can be constructed and will grow faster than rent reliably.

What's better correlated with my future rent, then an asset whose value is exactly equal to my future rent payments?

It seems like your argument should come down to one of:

• You don't know where you'll be living in the future. I think this is really the core question---how confident should you be about where you are living before buying makes sense? My guess is that it's in the ballpark of the rent-to-buy ratio. I think people could make better decisions by (i) pinning down that number for the markets they care about, (ii) thinking more clearly about that question, e.g. correcting for overconfidence.
• You may be tied down to an area, but should be willing to move out of your house if it becomes more expensive, so you shouldn't hedge against idiosyncratic risk in this exact neighborhood. This depends on how attached people get to houses and how large is the idiosyncratic real estate risk. (Maybe this is what you mean by "hedged against local markets"?) I don't know the numbers here, but I don't think it's obvious.
Comment by paulfchristiano on 87,000 Hours or: Thoughts on Home Ownership · 2019-07-06T20:02:38.175Z · score: 2 (1 votes) · LW · GW
100% exposure to local real estate is different than 100% exposure to a particular house

Sure. If you are happy living in that house it doesn't seem like a big deal. Maybe you could do better by being willing to move out of that house and into another house if this one became more expensive, but for lots of people that's a pain in the ass (and the idiosyncratic volatility is a small enough deal) that they wouldn't want to do it and would prefer just hedge against the risk.

It might be the best hedge available due to taxes and leverage for some individuals but only once they have enough money that they won't be screwed if their housing investment goes sideways.

If you can make your mortgage payments, how do you end up screwed if the investment goes sideways? In general, how does this change the basic calculus: I need to pay rent in this house, so I want to hedge against changes in rent?

I agree we can have a quantitative discussion about what's optimal, but to do that you have to actually engage with the quantitative question.

Comment by paulfchristiano on 87,000 Hours or: Thoughts on Home Ownership · 2019-07-06T16:51:28.314Z · score: 8 (4 votes) · LW · GW
First let's address the idea that renting is 'throwing money away'. This ignores the opportunity cost of investing extra income and the lump sum of the down payment into a house instead of the stock market. People spend on average 50% more money on mortgages, taxes, and fees than they spend on rent.

People who rent houses make money from doing it---landlords aren't in it out of the goodness of their hearts. I guess maybe you can get this kind of gap if you use nominal rather than real interest rates? But that doesn't seem financially relevant.

To first order you are going to be paying the same amount whether you buy or rent, unless the market is irrational. The most important corrections are (i) you'll be making a big investment in housing and the sales price will incorporate the value of that investment, so you need to think about whether you want that investment, (ii) there are transaction costs and frictions from both buying and renting, and the buying transaction costs will be larger unless you are living in one place for a reasonably long time.

(Maybe 10 years is the right ballpark for amortizing out transaction costs of buying---e.g. if those costs are 10% per sale, then over ten years that's 1% of the value of the house per year. At rent to buy of 30, that's about one third of rent. Some of the buying overhead will be embedded in rent, since landlords need to break even. And I'd bet the total inefficiencies of renting are in the ballpark of 10-20% of rent.)

So if you are living long enough to amortize out the buying transaction costs, now we are comparing the investment in housing to the investment in the stock market. The most important argument here is that if rent goes up your obligations go up, so you want investments that go up in that case as well. If you are staying in one place, owning a house is basically the optimal hedge against future rent obligations (since an increase in expected future rents will increased your wealth by exactly the same amount). I think there are a lot of caveats to that argument, but it's a reasonable first order approximation and it's a mistake to think about the finances of buying a house without considering it. This depends mostly on living in the same real estate market for a long time, not living in exactly the same house (though it's even better if you are living in the exact same house).

Comment by paulfchristiano on 87,000 Hours or: Thoughts on Home Ownership · 2019-07-06T16:30:40.169Z · score: 12 (6 votes) · LW · GW
When thinking about whether to invest your money into a house or something else, typically you want to decouple that decision from your living situation and see if it still makes sense. Regardless of where I live—given my net worth, does it make sense for $XXX,000 of my investment portfolio to be tied into this specific piece of real estate? The usual financial justification for owning a house is that you are "naturally short housing." If you expect to live in the same place your whole life, then you'll owe more money when rent goes up. So you want investments that go up in money when rent goes up and down when rent goes down. Being 100% exposed to the local real estate market is roughly optimal (though less if you aren't certain you are living there, or if you are more likely to move neighborhoods and you don't want exposure to this particular house). I don't think you can decouple these two. Comment by paulfchristiano on Contest:$1,000 for good questions to ask to an Oracle AI · 2019-07-04T04:59:05.720Z · score: 3 (2 votes) · LW · GW
in the sequential case manipulation can only change the distribution of inputs that the Oracles receive, but it doesn't improve performance on any particular given input

Why is that? Doesn't my behavior on question #1 affect both question #2 and its answer?

Also, this feels like a doomed game to me---I think we should be trying to reason from selection rather than relying on more speculative claims about incentives.

Comment by paulfchristiano on Contest: $1,000 for good questions to ask to an Oracle AI · 2019-07-03T22:37:49.549Z · score: 2 (1 votes) · LW · GW I mean, if the oracle hasn't yet looked at the question they could use simulation warfare to cause the preceding oracles to take actions that lead to them getting given easier questions. Once you start unbarring all holds, stuff gets wild. Comment by paulfchristiano on Contest:$1,000 for good questions to ask to an Oracle AI · 2019-07-03T17:11:59.250Z · score: 7 (4 votes) · LW · GW

I'm not sure I understand the concern. Isn't the oracle answering each question to maximize its payoff on that question in event of an erasure? So it doesn't matter if you ask it other questions during the evaluation period. (If you like, you can say that you are asking them to other oracles---or is there some way that an oracle is a distinguished part of the environment?)

If the oracle cares about its own performance in a broader sense, rather than just performance on the current question, then don't we have a problem anyway? E.g. if you ask it question 1, it will be incentivized to make it get an easier question 2? For example, if you are concerned about coordination amongst different instances of the oracle, this seems like it's a problem regardless.

I guess you can construct a model where the oracle does what you want, but only if you don't ask any other oracles questions during the evaluation period, but it's not clear to me how you would end up in that situation and at that point it seems worth trying to flesh out a more precise model.

Comment by paulfchristiano on Aligning a toy model of optimization · 2019-07-02T21:57:53.820Z · score: 2 (1 votes) · LW · GW

Determining whether aligned AI is impossible seems harder than determining whether there is any hope for a knowably-aligned AI.

Comment by paulfchristiano on Aligning a toy model of optimization · 2019-07-02T21:56:16.837Z · score: 3 (2 votes) · LW · GW

The point of working in this setting is mostly to constrain the search space or make it easier to construct an impossibility argument.

Comment by paulfchristiano on Aligning a toy model of optimization · 2019-07-02T21:55:03.459Z · score: 2 (1 votes) · LW · GW
When you say "test" do you mean testing by writing a single program that outputs whether the model performs badly on a given input (for any input)?
If so, I'm concerned that we won't be able to write such a program.

That's the hope. (Though I assume we mostly get it by an application of Opt, or more efficiently by modifying our original invocation of Opt to return a program with some useful auxiliary functions, rather than by writing it by hand.)

Comment by paulfchristiano on Aligning a toy model of optimization · 2019-07-02T21:49:19.102Z · score: 2 (1 votes) · LW · GW

If dropping competitiveness, what counts as a solution? Is "imitate a human, but run it fast" fair game? We could try to hash out the details in something along those lines, and I think that's worthwhile, but I don't think it's a top priority and I don't think the difficulties will end up being that similar. I think it may be productive to relax the competitiveness requirement (e.g. to allow solutions that definitely have at most a polynomial slowdown), but probably not a good idea to eliminate it altogether.

Comment by paulfchristiano on Aligning a toy model of optimization · 2019-07-01T16:58:57.727Z · score: 3 (2 votes) · LW · GW

That seems fine though. If the model behaves badly on any input we can test that. If the model wants to behave well on every input, then we're happy. If it wants to behave badly on some input, we'll catch it.

Are you concerned that we can't test whether the model is behaving badly on a particular input? I think if you have that problem you are also in trouble for outer alignment.

Comment by paulfchristiano on Aligning a toy model of optimization · 2019-06-30T19:10:17.028Z · score: 2 (1 votes) · LW · GW

The idea is that "the task being trained" is something like: 50% what you care about at the object level, 50% the subtasks that occur in the evaluation process. The model may sometimes get worse at the evaluation process, or at the object level task, you are just trying to optimize some weighted combination.

There are a bunch of distinct difficulties here. One is that the distribution of "subtasks that occur in the evaluation process" is nonstationary. Another is that we need to set up the game so that doing both evaluation and the object level task is not-much-harder than just doing the object level task.

Comment by paulfchristiano on Aligning a toy model of optimization · 2019-06-30T17:13:16.556Z · score: 2 (1 votes) · LW · GW

I think that when a design problem is impossible, there is often an argument for why it's impossible. Certainly that's not obvious though, and you might just be out of luck. (That said, it's also not obvious that problems in are easier to solve than , both contain problems you just can't solve and so you are relying on extra structural facts in either case.)

Comment by paulfchristiano on Aligning a toy model of optimization · 2019-06-30T17:10:08.000Z · score: 2 (1 votes) · LW · GW

If we can test whether the model is behaving badly on a given input, then we can use Opt to search for any input where the model behaves badly. So we can end up with a system that works well on distribution and doesn't work poorly off distribution. If it's possible to handle outer alignment in this setting, I expect inner alignment will also be possible. (Though it's not obvious, since you might take longer to learn given an unaligned learned optimizer.)

Comment by paulfchristiano on Aligning a toy model of optimization · 2019-06-30T17:06:30.219Z · score: 2 (1 votes) · LW · GW
Are you describing it as a problem that you (or others you already have in mind such as people at OpenAI) will work on, or are you putting it out there for people looking for a problem to attack?

I will work on it at least a little, I'm encouraging others to think about it.

So, something like, when training the next level agent in IDA, you initialize the model parameters with the current parameters rather than random parameters?

You don't even need to explicitly maintain separate levels of agent. You just always use the current model to compute the rewards, and use that reward function to compute a gradient and update.

Comment by paulfchristiano on Aligning a toy model of optimization · 2019-06-29T00:24:17.649Z · score: 18 (7 votes) · LW · GW
Most of this post seems to be simplified/streamlined versions of what you've written before.

I mostly want to call attention to this similar-but-slightly-simpler problem of aligning Opt. Most of the content is pretty similar to what I've described in the ML case, simplified partly as an exposition thing, and partly because everything is simpler for Opt. I want this problem statement to stand relatively independently since I think it can be worked on relatively independently (especially if it ends up being an impossibility argument).

"training a sequence of agents" is bad because it might require multiple invocations of Opt so it's not competitive with an unaligned AI that uses Opt a small constant number of times?

Yes.

It could be OK if the number of bits increased exponentially with each invocation (if an bit policy is overseen by a bunch of copies of an bit policy, then the total cost is 2x). I think it's more likely you'l just avoid doing anything like amplification.

Can you explain more how iterated amplification exploits properties of local search?

At each step of local search you have some current policy and you are going to produce a new one (e.g. by taking a gradient descent step, or by generating a bunch of perturbations). You can use the current policy to help define the objective for the next one, rather than needing to make a whole separate call to Opt.

Is this because (or one way to think about it is) Opt corresponds to NP and iterated amplification or debate correspond to something higher in the polynomial hierarchy?

Yes.

You described Opt as returning the argmax for U using only n times more compute than U, without any caveats. Surely this isn't actually possible because in the worst case it does require 2n times more than U? So the only way to be competitive with the Opt-based benchmark is to make use of Opt as a black box?

Yes.

Why is it easier? (If you treat them both as black boxes, the difficulty should be the same?) Is it because we don't have to treat the slow naive version of Opt as a black box that we have to make use of, and therefore there are more things we can do to try to be competitive with it?

Yes.

Why wouldn't just be impossible? Is it because ML occupies a different point on the speed/capability Pareto frontier and it might be easier to build an aligned AI near that point (compared to the point that the really slow AI occupies) ?

Because of ML involving local search. It seems unhappy because a single step of local search feels very similar to "generate a bunch of options and see which one is best," so it would be surprising if you could align local search without being able to align "generate a bunch of options and see which one is best." But it might be helpful that you are doing a long sequence of steps.

## Aligning a toy model of optimization

2019-06-28T20:23:51.337Z · score: 52 (17 votes)
Comment by paulfchristiano on Prosaic AI alignment · 2019-06-04T15:56:36.935Z · score: 5 (3 votes) · LW · GW

I'm imagining the first marginal unit of effort, which you'd apply to the most likely possibility. Its expected impact is reduced by that highest probability.

If you get unlucky, then your actual impact might be radically lower than if you had known what to work on.

Comment by paulfchristiano on Any rebuttals of Christiano and AI Impacts on takeoff speeds? · 2019-06-04T02:15:16.195Z · score: 6 (3 votes) · LW · GW
An example would be my Matrix Multiplication example (https://youtu.be/5DDdBHsDI-Y). Here, a series of 4 key insights turn the problem from requiring a decade, to a year, to a day, to a second.

In fact Strassen's algorithm is worse than textbook matrix multiplication for most reasonably sized matrices, including all matrices that could be multiplied in the 70s. Even many decades later the gains are still pretty small (and it's only worth doing for unusually giant matrix multiplies). As far as I am aware nothing more complicated than Strassen's algorithm is ever used in practice. So it doesn't seem like an example of a key insight enabling a problem to be solved.

We could imagine an alternate reality in which large matrix multiplications became possible only after we discovered Strassen's algorithm. But I think there is a reason that reality is alternate.

Overall I think difficult theory and clever insights are sometimes critical, perhaps often enough to more than justify our society's tiny investment in them, but it's worth having a sense of how exceptional these cases are.

Comment by paulfchristiano on Disincentives for participating on LW/AF · 2019-05-12T06:34:48.973Z · score: 23 (10 votes) · LW · GW

I don't comment more because writing comments takes time. I think that in person discussions tend to add more value per minute. (I expect your post is targeted at people who comment less than I do, but the reasons may be similar.)

I can imagine getting more mileage out of quick comments, which would necessarily be short and unplished. I'm less likely to do that because I feel like fast comments will often reflect poorly on me for a variety of reasons: they would have frequent and sometimes consequential errors (that would be excused in a short in-person discussion because of time), in general hastily-written comments send negative signal (better people write better comments, faster comments are worse, full model left as exercise for reader), I'd frequently leave errors uncorrected or threads of conversation dropped, and so on.

Comment by paulfchristiano on Announcement: AI alignment prize round 4 winners · 2019-04-19T16:43:32.257Z · score: 12 (3 votes) · LW · GW
Is there another way to spend money that seems clearly more cost-effective at this point, and if so what?

To be clear, I still think this is a good way to spend money. I think the main cost is time.

Comment by paulfchristiano on What failure looks like · 2019-03-27T16:18:55.133Z · score: 18 (6 votes) · LW · GW

I do agree there was a miscommunication about the end state, and that language like "lots of obvious destruction" is an understatement.

I do still endorse "military leaders might issue an order and find it is ignored" (or total collapse of society) as basically accurate and not an understatement.

Comment by paulfchristiano on What failure looks like · 2019-03-27T16:12:38.892Z · score: 11 (7 votes) · LW · GW

My median outcome is that people solve intent alignment well enough to avoid catastrophe. Amongst the cases where we fail, my median outcome is that people solve enough of alignment that they can avoid the most overt failures, like literally compromising sensors and killing people (at least for a long subjective time), and can build AIs that help defend them from other AIs. That problem seems radically easier---most plausible paths to corrupting sensors involve intermediate stages with hints of corruption that could be recognized by a weaker AI (and hence generate low reward). Eventually this will break down, but it seems quite late.

very confident that no AI company would implement something with this vulnerability?

The story doesn't depend on "no AI company" implementing something that behaves badly, it depends on people having access to AI that behaves well.

Also "very confident" seems different from "most likely failure scenario."

Haven't you yourself written about the failure modes of 'do things predicted to lead to videos that people rate as acceptable' where the attack involves surreptitiously reprogramming the camera to get optimal videos (including weird engineered videos designed to optimize on infelicities in the learned objective?

That's a description of the problem / the behavior of the unaligned benchmark, not the most likely outcome (since I think the problem is most likely to be solved). We may have a difference in view between a distribution over outcomes that is slanted towards "everything goes well" such that the most realistic failures are the ones that are the closest calls, vs. a distribution slanted towards "everything goes badly" such that the most realistic failures are the complete and total ones where you weren't even close.

Because it definitely seems that Vox got the impression from it that there is never a robot army takeover in the scenario, not that it's slightly preceded by camera hacking.

I agree there is a robot takeover shortly later in objective time (mostly because of the singularity). Exactly how long it is mostly depends on how early things go off the rails w.r.t. alignment, perhaps you have O(year).

Comment by paulfchristiano on What failure looks like · 2019-03-27T01:53:23.423Z · score: 7 (4 votes) · LW · GW

I agree that robot armies are an important aspect of part II.

In part I, where our only problem is specifying goals, I don't actually think robot armies are a short-term concern. I think we can probably build systems that really do avoid killing people, e.g. by using straightforward versions of "do things that are predicted to lead to videos that people rate as acceptable," and that at the point when things have gone off the rails those videos still look fine (and to understand that there is a deep problem at that point you need to engage with complicated facts about the situation that are beyond human comprehension, not things like "are the robots killing people?"). I'm not visualizing the case where no one does anything to try to make their AI safe, I'm imagining the most probable cases where people fail.

I think this is an important point, because I think much discussion of AI safety imagines "How can we give our AIs an objective which ensures it won't go around killing everyone," and I think that's really not the important or interesting part of specifying an objective (and so leads people to be reasonably optimistic about solutions that I regard as obviously totally inadequate). I think you should only be concerned about your AI killing everyone because of inner alignment / optimization daemons.

That said, I do expect possibly-catastrophic AI to come only shortly before the singularity (in calendar time) and so the situation "humans aren't able to steer the trajectory of society" probably gets worse pretty quickly. I assume we are on the same page here.

In that sense Part I is misleading. It describes the part of the trajectory where I think the action is, the last moments where we could have actually done something to avoid doom, but from the perspective of an onlooker that period could be pretty brief. If there is a Dyson sphere in 2050 it's not clear that anyone really cares what happened during 2048-2049. I think the worst offender is the last sentence of Part I ("By the time we spread through the stars...")

Part I has this focus because (i) that's where I think the action is---by the time you have robot armies killing everyone the ship is so sailed, I think a reasonable common-sense viewpoint would acknowledge this by reacting with incredulity to the "robots kill everyone" scenario, and would correctly place the "blame" on the point where everything got completely out of control even though there weren't actually robot armies yet (ii) the alternative visualization leads people to seriously underestimate the difficulty of the alignment problem, (iii) I was trying to describe the part of the picture which is reasonably accurate regardless of my views on the singularity.

Comment by paulfchristiano on What failure looks like · 2019-03-27T01:35:45.550Z · score: 10 (7 votes) · LW · GW
The Vox article also mistakes the source of influence-seeking patterns to be about social influence rather than 'systems that try to increase in power and numbers tend to do so, so are selected for if we accidentally or intentionally produce them and don't effectively weed them out; this is why living things are adapted to survive and expand; such desires motivate conflict with humans when power and reproduction can be obtained by conflict with humans, which can look like robot armies taking control.

Yes, I agree the Vox article made this mistake. Me saying "influence" probably gives people the wrong idea so I should change that---I'm including "controls the military" as a central example, but it's not what comes to mind when you hear "influence." I like "influence" more than "power" because it's more specific, captures what we actually care about, and less likely to lead to a debate about "what is power anyway."

In general I think the Vox article's discussion of Part II has some problems, and the discussion of Part I is closer to the mark. (Part I is also more in line with the narrative of the article, since Part II really is more like Terminator. I'm not sure which way the causality goes here though, i.e. whether they ended up with that narrative based on misunderstandings about Part II or whether they framed Part II in a way that made it more consistent with the narrative, maybe having been inspired to write the piece based on Part I.)

There is a different mistake with the same flavor, later in the Vox article: "But eventually, the algorithms’ incentives to expand influence might start to overtake their incentives to achieve the specified goal. That, in turn, makes the AI system worse at achieving its intended goal, which increases the odds of some terrible failure"

The problem isn't really "the AI system is worse at achieving its intended goal;" like you say, it's that influence-seeking AI systems will eventually be in conflict with humans, and that's bad news if AI systems are much more capable/powerful than we are.

[AI systems] wind up controlling or creating that military power and expropriating humanity (which couldn't fight back thereafter even if unified)

Failure would presumably occur before we get to the stage of "robot army can defeat unified humanity"---failure should happen soon after it becomes possible, and there are easier ways to fail than to win a clean war. Emphasizing this may give people the wrong idea, since it makes unity and stability seem like a solution rather than a stopgap. But emphasizing the robot army seems to have a similar problem---it doesn't really matter whether there is a literal robot army, you are in trouble anyway.

Comment by paulfchristiano on What failure looks like · 2019-03-26T20:01:47.191Z · score: 9 (4 votes) · LW · GW

I think of #3 and #5 as risk factors that compound the risks I'm describing---they are two (of many!) ways that the detailed picture could look different, but don't change the broad outline. I think it's particularly important to understand what failure looks like under a more "business as usual" scenario, so that people can separate objections to the existence of any risk from objections to other exacerbating factors that we are concerned about (like fast takeoff, war, people being asleep at the wheel, etc.)

I'd classify #1, #2, and #4 as different problems not related to intent alignment per se (though intent alignment may let us build AI systems that can help address these problems). I think the more general point is: if you think AI progress is likely to drive many of the biggest upcoming changes in the world, then there will be lots of risks associated with AI. Here I'm just trying to clarify what happens if we fail to solve intent alignment.

Comment by paulfchristiano on What's wrong with these analogies for understanding Informed Oversight and IDA? · 2019-03-23T16:11:00.604Z · score: 2 (1 votes) · LW · GW
Can you quote these examples? The word "example" appears 27 times in that post and looking at the literal second and third examples, they don't seem very relevant to what you've been saying here so I wonder if you're referring to some other examples.

Subsections "Modeling" and "Alien reasoning" of "Which C are hard to epistemically dominate?"

What I'm inferring from this (as far as a direct answer to my question) is that an overseer trying to do Informed Oversight on some ML model doesn't need to reverse engineer the model enough to fully understand what it's doing, only enough to make sure it's not doing something malign, which might be a lot easier, but this isn't quite reflected in the formal definition yet or isn't a clear implication of it yet. Does that seem right?

You need to understand what facts the model "knows." This isn't value-loaded or sensitive to the notion of "malign," but it's still narrower than "fully understand what it's doing."

As a simple example, consider linear regression. I think that linear regression probably doesn't know anything you don't. Yet doing linear regression is a lot easier than designing a linear model by hand.

If that's what you do, it seems “P outputs true statements just in the cases I can check.” could have a posterior that's almost 50%, which doesn't seem safe, especially in an iterated scheme where you have to depend on such probabilities many times?

Where did 50% come from?

Also "P outputs true statements in just the cases I check" is probably not catastrophic, it's only catastrophic once P performs optimization in order to push the system off the rails.

Comment by paulfchristiano on What's wrong with these analogies for understanding Informed Oversight and IDA? · 2019-03-20T18:26:52.204Z · score: 13 (4 votes) · LW · GW

A universal reasoner is allowed to use an intuition "because it works." They only take on extra obligations once that intuition reflects more facts about the world which can't be cashed out as predictions that can be confirmed on the same historical data that led us to trust the intuition.

For example, you have an extra obligation if Ramanujan has some intuition about why theorem X is true, you come to trust such intuitions by verifying them against proof of X, but the same intuitions also suggest a bunch of other facts which you can't verify.

In that case, you can still try to be a straightforward Bayesian about it, and say "our intuition supports the general claim that process P outputs true statements;" you can then apply that regularity to trust P on some new claim even if it's not the kind of claim you could verify, as long as "P outputs true statements" had a higher prior than "P outputs true statements just in the cases I can check." That's an argument that someone can give to support a conclusion, and "does process P output true statements historically?" is a subquestion you can ask during amplification.

The problem becomes hard when there are further facts that can't be supported by this Bayesian reasoning (and therefore might undermine it). E.g. you have a problem if process P is itself a consequentialist, who outputs true statements in order to earn your trust but will eventually exploit that trust for their own advantage. In this case, the problem is that there is something going on internally inside process P that isn't surfaced by P's output. Epistemically dominating P requires knowing about that.

See the second and third examples in the post introducing ascription universality. There is definitely a lot of fuzziness here and it seems like one of the most important places to tighten up the definition / one of the big research questions for whether ascription universality is possible.

Comment by paulfchristiano on What failure looks like · 2019-03-18T15:24:34.957Z · score: 11 (5 votes) · LW · GW
But why exactly should we expect that the problems you describe will be exacerbated in a future with powerful AI, compared to the state of contemporary human societies?

To a large extent "ML" refers to a few particular technologies that have the form "try a bunch of things and do more of what works" or "consider a bunch of things and then do the one that is predicted to work."

That is true but I think of this as a limitation of contemporary ML approaches rather than a fundamental property of advanced AI.

I'm mostly aiming to describe what I think is in fact most likely to go wrong, I agree it's not a general or necessary feature of AI that its comparative advantage is optimizing easy-to-measure goals.

(I do think there is some real sense in which getting over this requires "solving alignment.")

Comment by paulfchristiano on What failure looks like · 2019-03-18T04:21:45.450Z · score: 5 (3 votes) · LW · GW

I'm not mostly worried about influence-seeking behavior emerging by "specify a goal" --> "getting influence is the best way to achieve that goal." I'm mostly worried about influence-seeking behavior emerging within a system by virtue of selection within that process (and by randomness at the lowest level).

## What failure looks like

2019-03-17T20:18:59.800Z · score: 204 (77 votes)
Comment by paulfchristiano on Asymptotically Unambitious AGI · 2019-03-15T15:57:44.659Z · score: 4 (2 votes) · LW · GW

I don't see why their methods would be elegant. In particular, I don't see why any of {the anthropic update, importance weighting, updating from the choice of universal prior} would have a simple form (simpler than the simplest physics that gives rise to life).

I don't see how MAP helps things either---doesn't the same argument suggest that for most of the possible physics, the simplest model will be a consequentialist? (Even more broadly, for the universal prior in general, isn't MAP basically equivalent to a random sample from the prior, since some random model happens to be slightly more compressible?)

Comment by paulfchristiano on Asymptotically Unambitious AGI · 2019-03-15T01:20:30.341Z · score: 4 (2 votes) · LW · GW
I say maybe 1/5 chance it’s actually dominated by consequentialists

Do you get down to 20% because you think this argument is wrong, or because you think it doesn't apply?

What problem do you think bites you?

What's ? Is it O(1) or really tiny? And which value of do you want to consider, polynomially small or exponentially small?

But if it somehow magically predicted which actions BoMAI was going to take in no time at all, then c would have to be above 1/d.

Wouldn't they have to also magically predict all the stochasticity in the observations, and have a running time that grows exponentially in their log loss? Predicting what BoMAI will do seems likely to be much easier than that.

Comment by paulfchristiano on Asymptotically Unambitious AGI · 2019-03-14T17:04:49.765Z · score: 5 (3 votes) · LW · GW

This invalidates some of my other concerns, but also seems to mean things are incredibly weird at finite times. I suspect that you'll want to change to something less extreme here.

(I might well be misunderstanding something, apologies in advance.)

Suppose the "intended" physics take at least 1E15 steps to run on the UTM (this is a conservative lower bound, since you have to simulate the human for the whole episode). And suppose (I think you need much lower than this). Then the intended model gets penalized by at least exp(1E12) for its slowness.

For almost the same description complexity, I could write down physics + "precompute the predictions for the first N episodes, for every sequence of possible actions/observations, and store them in a lookup table." This increases the complexity by a few bits, some constant plus K(N|physics), but avoids most of the computation. In order for the intended physics to win, i.e. in order for the "speed" part of the speed prior to do anything, we need the complexity of this precomputed model to be at least 1E12 bits higher than the complexity of the fast model.

That appears to happen only once N > BB(1E12). Does that seem right to you?

We could talk about whether malign consequentialists also take over at finite times (I think they probably do, since the "speed" part of the speed prior is not doing any work until after BB(1E12) steps, long after the agent becomes incredibly smart), but it seems better to adjust the scheme first.

Using the speed prior seems more reasonable, but I'd want to know which version of the speed prior and which parameters, since which particular problem bites you will depend on those choices. And maybe to save time, I'd want to first get your take on whether the proposed version is dominated by consequentialists at some finite time.

Comment by paulfchristiano on Asymptotically Unambitious AGI · 2019-03-14T04:50:12.723Z · score: 4 (2 votes) · LW · GW

From the formal description of the algorithm, it looks like you use a universal prior to pick , and then allow the Turing machine to run for steps, but don't penalize the running time of the machine that outputs . Is that right? That didn't match my intuitive understanding of the algorithm, and seems like it would lead to strange outcomes, so I feel like I'm misunderstanding.

Comment by paulfchristiano on Asymptotically Unambitious AGI · 2019-03-14T04:48:33.191Z · score: 2 (1 votes) · LW · GW

(I actually have a more basic confusion, started a new thread.)

Comment by paulfchristiano on Asymptotically Unambitious AGI · 2019-03-14T04:21:36.569Z · score: 2 (1 votes) · LW · GW

(ETA: I think this discussion depended on a detail of your version of the speed prior that I misunderstood.)

Given a world model ν, which takes k computation steps per episode, let νlog be the best world-model that best approximates ν (in the sense of KL divergence) using only logk computation steps. νlog is at least as good as the “reasoning-based replacement” of ν.
The description length of νlog is within a (small) constant of the description length of ν. That way of describing it is not optimized for speed, but it presents a one-time cost, and anyone arriving at that world-model in this way is paying that cost.

To be clear, that description gets ~0 mass under the speed prior, right? A direct specification of the fast model is going to have a much higher prior than a brute force search, at least for values of large enough (or small enough, however you set it up) to rule out the alien civilization that is (probably) the shortest description without regard for computational limits.

One could consider instead νlogε, which is, among the world-models that ε-approximate ν in less than logk computation steps (if the set is non-empty), the first such world-model found by a searching procedure ψ. The description length of νlogε is within a (slightly larger) constant of the description length of ν, but the one-time computational cost is less than that of νlog.

Within this chunk of the speed prior, the question is: what are good ψ? Any reasonable specification of a consequentialist would work (plus a few more bits for it to understand its situation, though most of the work is done by handing it ), or of a petri dish in which a consequentialist would eventually end up with influence. Do you have a concrete alternative in mind, which you think is not dominated by some consequentialist (i.e. a ψ for which every consequentialist is either slower or more complex)?

Comment by paulfchristiano on Asymptotically Unambitious AGI · 2019-03-13T17:34:21.061Z · score: 2 (1 votes) · LW · GW
Once all the subroutines are "baked into its architecture" you just have: the algorithm "predict accurately" + "treacherous turn"

You only have to bake in the innermost part of one loop in order to get almost all the computational savings.

Comment by paulfchristiano on Asymptotically Unambitious AGI · 2019-03-13T17:31:54.293Z · score: 2 (1 votes) · LW · GW
a reasoning-based order (at least a Bayesian-reasoning-based order) should really just be called a posterior

Reasoning gives you a prior that is better than the speed prior, before you see any data. (*Much* better, limited only by the fact that the speed prior contains strategies which use reasoning.)

The reasoning in this case is not a Bayesian update. It's evaluating possible approximations *by reasoning about how well they approximate the underlying physics, itself inferred by a Bayesian update*, not by directly seeing how well they predict on the data so far.

The description length of the "do science" strategy (I contend) is less than the description length of the "do science" + "treacherous turn" strategy.

I think the only good arguments for this are in the limit where you don't care about simplicity at all and only care about running time, since then you can rule out all reasoning. The threshold where things start working depends on the underlying physics, for more computationally complex physics you need to pick larger and larger computation penalties to get the desired result.

Comment by paulfchristiano on Asymptotically Unambitious AGI · 2019-03-12T16:58:22.303Z · score: 2 (1 votes) · LW · GW
but that's exactly what we're doing to

It seems totally different from what we're doing, I may be misunderstanding the analogy.

Suppose I look out at the world and do some science, e.g. discovering the standard model. Then I use my understanding of science to design great prediction algorithms that run fast, but are quite complicated owing to all of the approximations and heuristics baked into them.

The speed prior gives this model a very low probability because it's a complicated model. But "do science" gives this model a high probability, because it's a simple model of physics, and then the approximations follow from a bunch of reasoning on top of that model of physics. We aren't trading off "shortness" for speed---we are trading off "looks good according to reasoning" for speed. Yes they are both arbitrary orders, but one of them systematically contains better models earlier in the order, since the output of reasoning is better than a blind prioritization of shorter models.

Of course the speed prior also includes a hypothesis that does "science with the goal of making good predictions," and indeed Wei Dai and I are saying that this is the part of the speed prior that will dominate the posterior. But now we are back to potentially-malign consequentialistism. The cognitive work being done internally to that hypothesis is totally different from the work being done by updating on the speed prior (except insofar as the speed prior literally contains a hypothesis that does that work).

In other words:

Suppose physics takes n bits to specify, and a reasonable approximation takes N >> n bits to specify. Then the speed prior, working in the intended way, takes N bits to arrive at the reasonable approximation. But the aliens take n bits to arrive at the standard model, and then once they've done that can immediately deduce the N bit approximation. So it sure seems like they'll beat the speed prior. Are you objecting to this argument?

(In fact the speed prior only actually takes n + O(1) bits, because it can specify the "do science" strategy, but that doesn't help here since we are just trying to say that the "do science" strategy dominates the speed prior.)

Comment by paulfchristiano on Asymptotically Unambitious AGI · 2019-03-12T16:46:47.846Z · score: 3 (2 votes) · LW · GW
The only other approach I can think of is trying to do the anthropic update ourselves.

If you haven't seen Jessica's post in this area, it's worth taking a quick look.

Comment by paulfchristiano on Asymptotically Unambitious AGI · 2019-03-12T16:44:47.138Z · score: 2 (1 votes) · LW · GW

I just mean: "universality" in the sense of a UTM isn't a sufficient property when defining the speed prior, the analogous property of the UTM is something more like: "You can run an arbitrary Turing machine without too much slowdown." Of course that's not possible, but it seems like you still want to be as close to that as possible (for the same reasons that you wanted universality at all).

I agree that it would be fine to sacrifice this property if it was helpful for safety.

Comment by paulfchristiano on Asymptotically Unambitious AGI · 2019-03-12T01:48:22.856Z · score: 2 (1 votes) · LW · GW
Using "reasoning" to pick which one to favor, is just picking the first one in some new order.

Yes, some new order, but not an arbitrary one. The resulting order is going to be better than the speed prior order, so we'll update in favor of the aliens and away from the rest of the speed prior.

one can't escape the necessity to introduce the arbitrary criterion of "valuing" earlier things on the list

Probably some miscommunication here. No one is trying to object to the arbitrariness, we're just making the point that the aliens have a lot of leverage with which to beat the rest of the speed prior.

(They may still not be able to if the penalty for computation is sufficiently steep---e.g. if you penalize based on circuit complexity so that the model might as well bake in everything that doesn't depend on the particular input at hand. I think it's an interesting open question whether that avoids all problems of this form, which I unsuccessfully tried to get at here.)

## Security amplification

2019-02-06T17:28:19.995Z · score: 20 (4 votes)

## Reliability amplification

2019-01-31T21:12:18.591Z · score: 21 (5 votes)

## Techniques for optimizing worst-case performance

2019-01-28T21:29:53.164Z · score: 23 (6 votes)

## Thoughts on reward engineering

2019-01-24T20:15:05.251Z · score: 29 (4 votes)

## Learning with catastrophes

2019-01-23T03:01:26.397Z · score: 26 (8 votes)

## Capability amplification

2019-01-20T07:03:27.879Z · score: 24 (7 votes)

## The reward engineering problem

2019-01-16T18:47:24.075Z · score: 23 (4 votes)

## Towards formalizing universality

2019-01-13T20:39:21.726Z · score: 29 (6 votes)

## Directions and desiderata for AI alignment

2019-01-13T07:47:13.581Z · score: 29 (6 votes)

## Ambitious vs. narrow value learning

2019-01-12T06:18:21.747Z · score: 19 (5 votes)

## AlphaGo Zero and capability amplification

2019-01-09T00:40:13.391Z · score: 26 (10 votes)

## Supervising strong learners by amplifying weak experts

2019-01-06T07:00:58.680Z · score: 28 (7 votes)

## Benign model-free RL

2018-12-02T04:10:45.205Z · score: 10 (2 votes)

## Corrigibility

2018-11-27T21:50:10.517Z · score: 39 (9 votes)

## Humans Consulting HCH

2018-11-25T23:18:55.247Z · score: 19 (3 votes)

## Approval-directed bootstrapping

2018-11-25T23:18:47.542Z · score: 19 (4 votes)

## Approval-directed agents

2018-11-22T21:15:28.956Z · score: 22 (4 votes)

## Prosaic AI alignment

2018-11-20T13:56:39.773Z · score: 36 (9 votes)

## An unaligned benchmark

2018-11-17T15:51:03.448Z · score: 27 (6 votes)

## Clarifying "AI Alignment"

2018-11-15T14:41:57.599Z · score: 54 (16 votes)

## The Steering Problem

2018-11-13T17:14:56.557Z · score: 38 (10 votes)

## Preface to the sequence on iterated amplification

2018-11-10T13:24:13.200Z · score: 39 (14 votes)

## The easy goal inference problem is still hard

2018-11-03T14:41:55.464Z · score: 38 (9 votes)

## Could we send a message to the distant future?

2018-06-09T04:27:00.544Z · score: 40 (14 votes)

## When is unaligned AI morally valuable?

2018-05-25T01:57:55.579Z · score: 101 (31 votes)

## Open question: are minimal circuits daemon-free?

2018-05-05T22:40:20.509Z · score: 112 (36 votes)

## Weird question: could we see distant aliens?

2018-04-20T06:40:18.022Z · score: 85 (25 votes)

## Implicit extortion

2018-04-13T16:33:21.503Z · score: 74 (22 votes)

## Prize for probable problems

2018-03-08T16:58:11.536Z · score: 135 (37 votes)

## Argument, intuition, and recursion

2018-03-05T01:37:36.120Z · score: 99 (29 votes)

## Funding for AI alignment research

2018-03-03T21:52:50.715Z · score: 108 (29 votes)

## Funding for independent AI alignment research

2018-03-03T21:44:44.000Z · score: 0 (0 votes)

## The abruptness of nuclear weapons

2018-02-25T17:40:35.656Z · score: 95 (35 votes)

2018-02-25T04:53:36.083Z · score: 104 (35 votes)

## Funding opportunity for AI alignment research

2017-08-27T05:23:46.000Z · score: 1 (1 votes)

## Ten small life improvements

2017-08-20T19:09:23.673Z · score: 26 (19 votes)

## Crowdsourcing moderation without sacrificing quality

2016-12-02T21:47:57.719Z · score: 15 (11 votes)

## Optimizing the news feed

2016-12-01T23:23:55.403Z · score: 9 (10 votes)

## The universal prior is malign

2016-11-30T22:31:41.000Z · score: 4 (4 votes)

## Recent AI control posts

2016-11-29T18:53:57.656Z · score: 12 (13 votes)

## My recent posts

2016-11-29T18:51:09.000Z · score: 5 (5 votes)

## If we can't lie to others, we will lie to ourselves

2016-11-26T22:29:54.990Z · score: 25 (18 votes)

## Less costly signaling

2016-11-22T21:11:06.028Z · score: 14 (16 votes)

## Control and security

2016-10-15T21:11:55.000Z · score: 3 (3 votes)

## What is up with carbon dioxide and cognition? An offer

2016-04-23T17:47:43.494Z · score: 38 (31 votes)

## Time hierarchy theorems for distributional estimation problems

2016-04-20T17:13:19.000Z · score: 2 (2 votes)

## Another toy model of the control problem

2016-01-30T01:50:12.000Z · score: 1 (1 votes)

## My current take on logical uncertainty

2016-01-29T21:17:33.000Z · score: 2 (2 votes)