What's been written about the nature of "son-of-CDT"? 2019-11-30T21:03:44.958Z
What is the connection between these two definitions of ascription universality? 2019-10-27T18:03:20.369Z
How does the organization "EthAGI" fit into the broader AI safety landscape? 2019-07-08T00:46:02.191Z
Is it good practice to write questions/comments on old posts you're trying to understand? 2019-06-27T09:23:01.619Z
Evidence other than evolution for optimization daemons? 2019-04-21T20:50:18.986Z


Comment by liam-donovan on [Link] Ignorance, a skilled practice · 2020-02-09T14:48:39.896Z · LW · GW

Why are you calling this a nitpick? IMO it's a major problem with the post -- I was very unhappy that no mention was made of this obvious problem with the reasoning presented.

Comment by liam-donovan on The Rocket Alignment Problem · 2020-01-20T20:38:24.483Z · LW · GW

For what it's worth, I was just learning about the basics of MIRI's research when this came out, and reading it made me less convinced of the value of MIRI's research agenda. That's not necessarily a major problem, since the expected change in belief after encountering a given post should be 0, and I already had a lot of trust in MIRI. However, I found this post by Jessica Taylor vastly clearer and more persuasive (it was written before "Rocket Alignment", but I read "Rocket Alignment" first). In particular, I would expect AI researchers to be much more competent than the portrayal of spaceplane engineers in the post, and it wasn't clear to me why the analogy should be strong Bayesian evidence for MIRI being correct.

Comment by liam-donovan on Bay Solstice 2019 Retrospective · 2020-01-18T13:21:25.513Z · LW · GW

Maybe people who rationalized their failure to lose weight by "well, even Eliezer is overweight, it's just metabolic disprivilege"

Comment by liam-donovan on Bay Solstice 2019 Retrospective · 2020-01-17T16:47:02.750Z · LW · GW

How many people raised their hands when Eliezer asked about the probability estimate? When I was watching the video I gave a probability estimate of 65%, and I'm genuinely shocked that "not many" people thought he had over a 55% chance. This is Eliezer we're talking about.............

Comment by liam-donovan on Criticism as Entertainment · 2020-01-11T11:06:10.364Z · LW · GW

I wonder if it negatively impacts the cohesiveness/teamwork ability of the resulting AI safety community by disproportionately attracting a certain type of person? It seems unlikely that everyone would enjoy this style

Comment by liam-donovan on romeostevensit's Shortform · 2020-01-03T06:36:28.480Z · LW · GW


Comment by liam-donovan on 2020's Prediction Thread · 2020-01-01T05:32:45.183Z · LW · GW

FWIW you can bet on some of these on PredictIt -- for example, Predictit assigns only a 47% chance Trump will win in 2020. That's not a huge difference, but still worth betting 5% of your bankroll (after fees) on if you bet half-Kelly. (if you want to bet with me for whatever reason, I'd also be willing to bet up to $700 that Trump doesn't win at PredictIt odds if I don't have to tie up capital)

Comment by liam-donovan on 2010s Predictions Review · 2019-12-30T23:55:15.121Z · LW · GW

We can test if the most popular books & music of 2019 sold less copies than the most popular books & music of 2009 (I might or might not look into this later)

Comment by liam-donovan on Programmers Should Plan For Lower Pay · 2019-12-29T07:09:10.478Z · LW · GW
GDP is 2x higher than in 2000

Why not use per capita real GDP (+25% since 2000)?

Comment by liam-donovan on How’s that Epistemic Spot Check Project Coming? · 2019-12-28T06:23:13.313Z · LW · GW

I'm thinking that if there were liquid prediction markets for amplifying ESCs, people could code bots to do exactly what John suggests and potentially make money. This suggests to me that there's no principled difference between the two ideas, though I could be missing something (maybe you think the bot is unlikely to beat the market?)

Comment by liam-donovan on Funk-tunul's Legacy; Or, The Legend of the Extortion War · 2019-12-27T23:03:32.660Z · LW · GW

Based on the quote from Jessica Taylor, it seems like the FDT agents are trying to maximize their long-term share of the population, rather than their absolute payoffs in a single generation? If I understand the model correctly, that means the FDT agents should try to maximize the ratio of FDT payoff : 9-bot payoff (to maximize the ratio of FDT:9-bot in the next generation). The algebra then shows that they should refuse to submit to 9-bots once the population of FDT agents gets high enough (Wolfram|Alpha link), without needing to drop the random encounters assumption.

It still seems like CDT agents would behave the same way given the same goals, though?

Comment by liam-donovan on How’s that Epistemic Spot Check Project Coming? · 2019-12-26T21:32:37.830Z · LW · GW

What's the difference between John's suggestion and amplifying ESCs with prediction markets? (not rhetorical)

Comment by liam-donovan on 2019 AI Alignment Literature Review and Charity Comparison · 2019-12-19T05:17:07.209Z · LW · GW

I was somewhat confused by the discussion of LTFF grants being rejected by CEA; is there a public writeup of which grants were rejected?

Comment by liam-donovan on Embedded World-Models · 2019-12-09T11:40:48.722Z · LW · GW

In order to do this, the agent needs to be able to reason approximately about the results of their own computations, which is where logical uncertainty comes in

Comment by liam-donovan on Decision Theory · 2019-12-09T11:30:53.169Z · LW · GW

Why does being updateless require thinking through all possibilities in advance? Can you not make a general commitment to follow UDT, but wait until you actually face the decision problem to figure out which specific action UDT recommends taking?

Comment by liam-donovan on Q&A with Shane Legg on risks from AI · 2019-12-09T04:14:35.655Z · LW · GW

Well, it's been 8 years; how close are ML researchers to a "proto-AGI" with the capabilities listed? (embarassingly, I have no idea what the answer is)

Comment by liam-donovan on There's No Fire Alarm for Artificial General Intelligence · 2019-12-08T17:28:00.420Z · LW · GW

Apparently an LW user did a series of interviews with AI researchers in 2011, some of which included a similar question. I know most LW users have probably seen this, but I only found it today and thought it was worth flagging here.

Comment by liam-donovan on There is a war. · 2019-12-08T14:40:02.950Z · LW · GW
What are the competing explanations for high time preference?

A better way to phrase my confusion: How do we know the current time preference is higher than what we would see in a society that was genuinely at peace?

The competing explanations I was thinking of were along the lines of "we instinctively prefer having stuff now to having stuff later"

Comment by liam-donovan on What's been written about the nature of "son-of-CDT"? · 2019-12-06T21:34:59.190Z · LW · GW

Yeah, I was implicitly assuming that initiating a successor agent would force Omega to update its predictions about the new agent (and put the $1m in the box). As you say, that's actually not very relevant, because it's a property of a specific decision problem rather than CDT or son-of-CDT.

Comment by liam-donovan on Robust Agency for People and Organizations · 2019-12-03T14:05:11.214Z · LW · GW

(I apologize in advance if this is too far afield of the intended purpose of this post)

How does the claim that "group agents require membranes" interact with the widespread support for dramatically reducing or eliminating restrictions to immigration ("open borders" for short) within the EA/LW community? I can think of several possibilities, but I'm not sure which is true:

  • There actually isn't much support for open borders
  • Open borders supporters believe that "group agents require membranes" is a reasonable generaliation, but borders are not a relevant kind of "membrane", or nations are not "group agents" in the relevant sense
  • The people who support open borders generally aren't the same people who are thinking about group agency at all
  • Open borders supporters have thought about group agency and concluded that "group agents require membranes" is not a reasonable generalization
  • Open borders supporters believe that there is no need for nations to have group agency
  • Something else I haven't thought of

Context: I have an intuition that reduced/eliminated immigration restrictions reduce global coordination, and this post helped me crystallize it (if nations have less group agency, it's harder to coordinate)

Comment by liam-donovan on The "Commitment Races" problem · 2019-12-02T18:57:12.477Z · LW · GW

Would trying to become less confused about commitment races before building a superintelligent AI count as a metaphilosophical approach or a decision theoretic one (or neither)? I'm not sure I understand the dividing line between the two.

Comment by liam-donovan on A Practical Theory of Memory Reconsolidation · 2019-12-02T17:43:21.253Z · LW · GW
if you're interested in anything in particular, I'll be happy to answer.

I very much appreciate the offer! I can't think of anything specific, though; the comments of yours that I find most valuable tend to be "unknown unknowns" that suggest a hypothesis I wouldn't previously have been able to articulate.

Comment by liam-donovan on A Practical Theory of Memory Reconsolidation · 2019-12-02T11:40:59.221Z · LW · GW

Have you written anything like "cousin_it's life advice"? I often find your comments extremely insightful in a way that combines the best of LW ideas with wisdom from other areas, and would love to read more.

Comment by liam-donovan on Buck's Shortform · 2019-12-02T10:55:02.929Z · LW · GW
The prior probability ratio is 1:99, and the likelihood ratio is 20:1, so the posterior probability is 120:991 = 20:99, so you have probability of 20/(20+99) of having breast cancer.

What does "120:991" mean here?

Comment by liam-donovan on What's been written about the nature of "son-of-CDT"? · 2019-12-01T16:48:02.194Z · LW · GW

After thinking about it some more, I don't think this is true.

A concrete example: Let's say there's a CDT paperclip maximizer in an environment with Newcomb-like problems that's deciding between 3 options.

1. Don't hand control to any successor

2. Hand off control to a "LDT about correlations formed after 7am, CDT about correlations formed before 7am" successor

3. Hand off control to a LDT successor.

My understanding is that the CDT agent would take the choice that causes the highest number of paperclips to be created (in expectation). If both successors are aligned with the CDT agent, I would expect the CDT agent to choose option #3. The LDT successor agent would be able to gain more resources (and thus create more paperclips) than the other two possible agents, when faced with a Newcomb-like problem with correlations formed before the succession time. The CDT agent can cause this outcome to happen if and only if it chooses option #3.

I'm not at all sure that son-of-CDT resembles any known logical decision theory, but I don't see why it would resemble "LDT about correlations formed after 7am, CDT about correlations formed before 7am".

Edit: I agree that a CDT agent will never agree to precommit to acting like a LDT agent for correlations that have already been created, but I don't think that determines what kind of successor agent they would choose to create.

Comment by liam-donovan on There is a war. · 2019-11-26T15:41:23.030Z · LW · GW

That makes sense to me, but unfortunately I'm no closer to understanding the quoted passage. Some specific confusions:

  • What's the link between death rate and time preference? My best guess is that declining life expectancy implies scarcity, but I also don't get....
  • the link between scarcity and time preference? My best guess is that high time preference means people don't put the work in to ensure sufficient future productive capacity, but that doesn't help me understand the quote so I think I'm missing something.
  • I get why emergency mobilization increases time preference, but not why high time preference is strong evidence of emergency mobilization (as opposed to other possible explanations)

Comment by liam-donovan on There is a war. · 2019-11-26T15:04:49.016Z · LW · GW

Can someone explain/point me to useful resources to understand the idea of time preference as expresed in this post? In particular, I'm struggling to understand these sentences:

This suggests that near the center time preference has increased to the point where we’re creating scarcity faster than we’re alleviating it, while at the periphery scarcity is still actually being alleviated because there’s enough scarcity to go around, or perhaps marginal areas do not suffer so much from total mobilization.

I also don't understand why having an internal rate of return of 10% is evidence that we're in an emergency state of mobilization (relative to the hypothesis that managers are poorly incentivized to do long-term planning for other reasons).

Comment by liam-donovan on There is a war. · 2019-11-26T14:54:07.744Z · LW · GW

I think quantitative easing is an example (if I understood the post correctly, which I'm not sure about). By buying up bonds, the government is putting more dollars into the economy, which reduces the "amount of stuff produced per dollar", thus creating scarcity (in other words, QE increases aggregate demand). To alleviate this pressure, people make more stuff in order to meet the excess demand (i.e. unemployment rates go down). Forcing the unemployment rate down is the same as "requiring almost everyone to do things"

Comment by liam-donovan on Matthew Walker's "Why We Sleep" Is Riddled with Scientific and Factual Errors · 2019-11-19T00:21:15.952Z · LW · GW

Maybe the claim that climate scientists are liars? I don't know if it's true, but if I knew it were false I'd definitely downvote the post...

Comment by liam-donovan on Can corrigibility be learned safely? · 2019-11-03T17:31:25.428Z · LW · GW

I understand that, but I don't see why #2 is likely to be achievable. Corrigibility seems very similar to Wei Dai's translation example, so it seems like there could be many deceptive actions that humans would intuitively recognize as not corrigible, but which would fool an early-stage LBO tree into assigning a high reward. This seems like it would be a clear example of "giving a behaviour a high reward because it is bad". Unfortunately I can't think of any good examples, so my intuition may simply be mistaken.

Incidentally, it seems like Ought could feasibly test whether meta-execution is sufficient to ensure corrigibility; for example, a malicious expert could recommend deceptive/influence-seizing actions to an agent in a simulated environment, and the meta-execution tree would have to detect every deceptive action without any contextual knowledge . Are there any plans to do this?

Comment by liam-donovan on Understanding Iterated Distillation and Amplification: Claims and Oversight · 2019-11-03T17:02:13.736Z · LW · GW

That makes sense; so it's a general method that's applicable whenever the bandwidth is too low for an individual agent to construct the relevant ontology?

Comment by liam-donovan on Understanding Iterated Distillation and Amplification: Claims and Oversight · 2019-11-03T16:57:41.504Z · LW · GW
plus maybe other properties

That makes sense; I hadn't thought of the possibility that a security failure in the HBO tree might be acceptable in this context. OTOH, if there's an input that corrupts the HBO tree, isn't it possible that the corrupted tree could output a supposed "LBO overseer" that embeds the malicious input and corrupts us when we try to verify it? If the HBO tree is insecure, it seems like a manual process that verifies its output must be insecure as well.

Comment by liam-donovan on Some problems with making induction benign, and approaches to them · 2019-10-31T11:47:04.206Z · LW · GW

I don't understand the argument that a speed prior wouldn't work: wouldn't the abstract reasoner still have to simulate the aliens in order to know what output to read from the zoo earths? I don't understand how "simulate a zoo earth with a bitstream that is controlled by aliens in a certain way" would ever get a higher prior weight than "simulate an earth that never gets controlled by aliens". Is the idea that each possible zoo earth with simple-to-describe aliens has a relatively similar prior weight to the real earth, so they collectively have a much higher prior weight?

Comment by liam-donovan on Prediction markets for internet points? · 2019-10-27T20:48:09.153Z · LW · GW
I think it’s likely that these markets would quickly converge to better predictions than existing political prediction markets

Why would you expect this to be true? I (and presumably many others) spend a lot of time researching questions on existing political prediction markets because I can win large sums ($1k+ per question) doing so. I don't see why anyone would have an incentive to put in a similar amount of time to win Internet Points, and as a result I don't see why these markets would outperform existing political prediction markets. Is the idea that many people contributing a minimally-informed opinion will lead to more efficient results than a few people contributing a well-informed opinion + a bunch of noise traders?

Comment by liam-donovan on Eli's shortform feed · 2019-10-27T13:45:53.458Z · LW · GW

Is there any information on how Von Neumann came to believe Catholicism was the correct religion for Pascal Wager purposes? "My wife is Catholic" doesn't seem like very strong evidence...

Comment by liam-donovan on Can corrigibility be learned safely? · 2019-10-26T21:21:07.129Z · LW · GW

How do you ensure that property #3 is satisfied in the early stages of the amplification process? Since no agent in the tree will have context, and the entire system isn't very powerful yet, it seems like there could easily be inputs that would naively generate a high reward "by being bad", which the overseer couldn't detect.

Comment by liam-donovan on bgaesop's Shortform · 2019-10-25T17:04:10.035Z · LW · GW

From an epistemic rationality perspective, isn't becoming less aware of your emotions and body a really bad thing? Not only does it give you false beliefs, but "not being in touch with your emotions/body" is already a stereotyped pitfall for a rationalist to fall into...

Comment by liam-donovan on Understanding Iterated Distillation and Amplification: Claims and Oversight · 2019-10-25T09:26:06.494Z · LW · GW

Is meta-execution HBO, LBO, or a general method that could be implemented with either? (my current credences: 60% LBO, 30% general method, 10% HBO)

Comment by liam-donovan on Understanding Iterated Distillation and Amplification: Claims and Oversight · 2019-10-25T09:12:03.945Z · LW · GW

How does this address the security issues with HBO? Is the idea that only using the HBO system to construct a "core for reasoning" reduces the chances of failure by exposing it to less inputs/using it for less total time? I feel like I'm missing something...

Comment by liam-donovan on bgaesop's Shortform · 2019-10-24T16:39:35.433Z · LW · GW


Comment by liam-donovan on Deleted · 2019-10-23T18:17:19.035Z · LW · GW

Yep, I misread the page, my mistake

Comment by liam-donovan on Deleted · 2019-10-23T18:16:45.280Z · LW · GW
and from my perspective this is a good thing, because it means we've made moral progress as a society.

I know this is off-topic, but I'm curious how you would distinguish between moral progress and "moral going-in-circles" (don't know what the right word is)?

Comment by liam-donovan on Deleted · 2019-10-23T17:53:40.419Z · LW · GW

(Keeping in mind that I have nothing to do with the inquiry and can't speak for OP)

Why is it desirable for the inquiry to turn up a representative sample of unpopular beliefs? If that were explicitly the goal, I would agree with you; I'd also agree (?) that questions with that goal shouldn't be allowed. However, I thought the idea was to have some examples of unpopular opinions to use in a separate research study, rather than to directly research what unpopular beliefs LW holds.

If the conclusion of the research turns out to be "here is a representative sample of unpopular LW beliefs: <a set of beliefs that doesn't include anything too reactionary/politically controversial>", that would be a dishonest & unfortunate conclusion.

Comment by liam-donovan on Deleted · 2019-10-23T16:12:35.633Z · LW · GW

I downvoted because I think the benefit of making stuff like this socially unacceptable on LW is higher than the cost of the OP getting one less response to their survey. The reasons it might be " strong-downvote-worthy had it appeared in most other possible contexts" still apply here, and the costs of replacing it with a less-bad example seem fairly minimal.

Comment by liam-donovan on Deleted · 2019-10-23T16:11:12.491Z · LW · GW

I think the US is listed because it's mandatory that we register for the draft

Comment by liam-donovan on Deleted · 2019-10-23T15:39:49.095Z · LW · GW
Euthanasia should be a universal right.

This doesn't sound non-normative at all?

Comment by liam-donovan on What does "meta-execution without indirection" look like? · 2019-10-21T14:38:10.378Z · LW · GW

My current best-guess answer for what "HCH + annotated functional programming" and no indirection is:

Instead of initializing the tree with the generic question "what should the agent do next", you initialize the tree with the specific question you want an answer for. In the context of IDA, I think (??) this would be a question sampled from the distribution of questions you want the IDA agent to be able to answer well.

Is it fair to say the HCH + AFP part mainly achieves capability amplification, and the indirection part mainly achieves security amplification?

Edit: apparently I somehow asked the original question under a different account name that I've never used before? In case anyone finds this weird/confusing: both Liam Donovans are the same person, but this is the account I normally use.

Comment by liam-donovan on What are the differences between all the iterative/recursive approaches to AI alignment? · 2019-10-21T10:26:08.661Z · LW · GW

Huh, I thought that all amplification/distillation procedures were intended as a way to approximate HCH, which is itself a tree. Can you not meaningfully discuss "this amplification procedure is like an n-depth approximation of HCH at step x", for any amplification procedure?

For example, the internal structure of the distilled agent described in Christiano's paper is unlikely to look anything like a tree. However, my (potentially incorrect?) impression is that the agent's capabilities at step x are identical to an HCH tree of depth x if the underlying learning system is arbitrarily capable.

It's possible that I'm not understanding the difference between "depth", "tree-based" and "recursion" in this context

Comment by liam-donovan on What are the differences between all the iterative/recursive approaches to AI alignment? · 2019-10-21T10:23:29.414Z · LW · GW

Huh, what would you recommend I do to reduce my uncertainty around meta-execution (e.g. "read x", "ask about it as a top level question", etc)?

Comment by liam-donovan on What are the differences between all the iterative/recursive approaches to AI alignment? · 2019-10-20T22:00:52.869Z · LW · GW

Is this necessarily true? It seems like this describes what Christiano calls "delegation" in his paper, but wouldn't apply to IDA schemes with other capability amplification methods (such as the other examples in the appendix of "Capability Amplification").