Posts

ARC's first technical report: Eliciting Latent Knowledge 2021-12-14T20:09:50.209Z
ARC is hiring! 2021-12-14T20:09:33.977Z
Your Time Might Be More Valuable Than You Think 2021-10-18T00:55:03.380Z
The Simulation Hypothesis Undercuts the SIA/Great Filter Doomsday Argument 2021-10-01T22:23:23.488Z
Fractional progress estimates for AI timelines and implied resource requirements 2021-07-15T18:43:10.163Z
Intermittent Distillations #4: Semiconductors, Economics, Intelligence, and Technological Progress. 2021-07-08T22:14:23.374Z
Anthropic Effects in Estimating Evolution Difficulty 2021-07-05T04:02:18.242Z
An Intuitive Guide to Garrabrant Induction 2021-06-03T22:21:41.877Z
Rogue AGI Embodies Valuable Intellectual Property 2021-06-03T20:37:30.805Z
Intermittent Distillations #3 2021-05-15T07:13:24.438Z
Pre-Training + Fine-Tuning Favors Deception 2021-05-08T18:36:06.236Z
Less Realistic Tales of Doom 2021-05-06T23:01:59.910Z
Agents Over Cartesian World Models 2021-04-27T02:06:57.386Z
[Linkpost] Treacherous turns in the wild 2021-04-26T22:51:44.362Z
Intermittent Distillations #2 2021-04-14T06:47:16.356Z
Transparency Trichotomy 2021-03-28T20:26:34.817Z
Intermittent Distillations #1 2021-03-17T05:15:27.117Z
Strong Evidence is Common 2021-03-13T22:04:40.538Z
Open Problems with Myopia 2021-03-10T18:38:09.459Z
Towards a Mechanistic Understanding of Goal-Directedness 2021-03-09T20:17:25.948Z
Coincidences are Improbable 2021-02-24T09:14:11.918Z
Chain Breaking 2020-12-29T01:06:04.122Z
Defusing AGI Danger 2020-12-24T22:58:18.802Z
TAPs for Tutoring 2020-12-24T20:46:50.034Z
The First Sample Gives the Most Information 2020-12-24T20:39:04.936Z
Does SGD Produce Deceptive Alignment? 2020-11-06T23:48:09.667Z
What posts do you want written? 2020-10-19T03:00:26.341Z
The Solomonoff Prior is Malign 2020-10-14T01:33:58.440Z
What are objects that have made your life better? 2020-05-21T20:59:27.653Z
What are your greatest one-shot life improvements? 2020-05-16T16:53:40.608Z
Training Regime Day 25: Recursive Self-Improvement 2020-04-29T18:22:03.677Z
Training Regime Day 24: Resolve Cycles 2 2020-04-28T19:00:09.060Z
Training Regime Day 23: TAPs 2 2020-04-27T17:37:15.439Z
Training Regime Day 22: Murphyjitsu 2 2020-04-26T20:18:50.505Z
Training Regime Day 21: Executing Intentions 2020-04-25T22:16:04.761Z
Training Regime Day 20: OODA Loop 2020-04-24T18:11:30.506Z
Training Regime Day 19: Hamming Questions for Potted Plants 2020-04-23T16:00:10.354Z
Training Regime Day 18: Negative Visualization 2020-04-22T16:06:46.138Z
Training Regime Day 17: Deflinching and Lines of Retreat 2020-04-21T17:45:34.766Z
Training Regime Day 16: Hamming Questions 2020-04-20T14:51:31.310Z
Mark Xu's Shortform 2020-03-10T08:11:23.586Z
Training Regime Day 16: Hamming Questions 2020-03-01T18:46:32.335Z
Training Regime Day 15: CoZE 2020-02-29T17:13:42.685Z
Training Regime Day 14: Traffic Jams 2020-02-28T17:52:28.354Z
Training Regime Day 13: Resolve Cycles 2020-02-27T17:45:07.845Z
Training Regime Day 12: Focusing 2020-02-26T19:07:15.407Z
Training Regime Day 11: Socratic Ducking 2020-02-25T17:19:57.320Z
Training Regime Day 10: Systemization 2020-02-24T17:20:15.385Z
Training Regime Day 9: Double-Crux 2020-02-23T18:08:31.108Z
Training Regime Day 8: Noticing 2020-02-22T19:47:03.898Z

Comments

Comment by Mark Xu (mark-xu) on Prizes for ELK proposals · 2022-01-14T20:18:43.241Z · LW · GW

We generally imagine that it’s impossible to map the predictors net directly to an answer because the predictor is thinking in terms of different concepts, so it has to map to the humans nodes first in order to answer human questions about diamonds and such.

Comment by Mark Xu (mark-xu) on Prizes for ELK proposals · 2022-01-13T17:28:56.209Z · LW · GW

The SmartFabricator seems basically the same. In the robber example, you might imagine the SmartVault is the one that puts up the screen to conceal the fact that it let the diamond get stolen.

Comment by Mark Xu (mark-xu) on Prizes for ELK proposals · 2022-01-13T05:05:52.530Z · LW · GW

Looks good to me.

Comment by Mark Xu (mark-xu) on Prizes for ELK proposals · 2022-01-07T19:27:26.184Z · LW · GW

Yes. Section Strategy: have a human operate the SmartVault and ask them what happened describes what I think you're asking about.

Comment by Mark Xu (mark-xu) on Prizes for ELK proposals · 2022-01-07T04:06:11.697Z · LW · GW

A different way of phrasing Ajeya's response, which I think is roughly accurate, is that if you have a reporter that gives consistent answers to questions, you've learned a fact about the predictor, namely "the predictor was such that when it was paired with this reporter it gave consistent answers to questions." if there were 8 predictor for which this fact was true then "it's the [7th] predictor such that when it was paired with this reporter it gave consistent answers to questions" is enough information to uniquely determine the reporter, e.g. the previous fact + 3 additional bits was enough. if the predictor was 1000 bits, the fact that it was consistent with a reporter "saved" you 997 bits, compressing the predictor into 3 bits.

The hope is that maybe the honest reporter "depends" on larger parts of the predictor's reasoning, so less predictors are consistent with it, so the fact that a predictor is consistent with the honest reporter allows you to compress the predictor more. As such, searching for reporters that most compressed the predictor would prefer the honest reporter. However, the best way for a reporter to compress a predictor is to simply memorize the entire thing, so if the predictor is simple enough and the gap between the complexity of the human-imitator and the direct translator is large enough, then the human-imitator+memorized predictor is the simplest thing that maximally compresses the predictor.

Comment by Mark Xu (mark-xu) on Prizes for ELK proposals · 2022-01-07T03:57:27.742Z · LW · GW

[deleted]

Comment by Mark Xu (mark-xu) on Prizes for ELK proposals · 2022-01-07T03:54:27.092Z · LW · GW

There is a distinction between the way that the predictor is reasoning and the way that the reporter works. Generally, we imagine that that the predictor is trained the same way the "unaligned benchmark" we're trying to compare to is trained, and the reporter is the thing that we add onto that to "align" it (perhaps by only training another head on the model, perhaps by finetuning). Hopefully, the cost of training the reporter is small compared to the cost of the predictor (maybe like 10% or something)

In this frame, doing anything to train the way the predictor is trained results in a big competitiveness hit, e.g. forcing the predictor to use the same ontology as a human is potentially going to prevent it from using concepts that make reasoning much more efficient. However, training the reporter in a different way, e.g. doubling the cost of training the reporter, only takes you from 10% of the predictor to 20%, which not that bad of a competitiveness hit (assuming that the human imitator takes 10% of the cost of the original predictor to train).

In summary, competitiveness for ELK proposals primarily means that you can't change the way the predictor was trained. We are already assuming/hoping the reporter is much cheaper to train than the predictor, so making the reporter harder to train results in a much smaller competitiveness hit.

Comment by Mark Xu (mark-xu) on ARC's first technical report: Eliciting Latent Knowledge · 2021-12-24T02:29:45.571Z · LW · GW

I think that problem 1 and problem 2 as you describe them are potentially talking about the same phenomenon. I'm not sure I'm understanding correctly, but I think I would make the following claims:

  • Our notion of narrowness is that we are interested in solving the problem where the question we're asking is such that a state always resolves a question. E.g. there isn't any ambiguity around whether a state "really contains a diamond". (Note that there is ambiguity around whether the human could detect the diamond from any set of observations because there could be a fake diamond or nanobots filtering what the human sees). It might be useful to think of this as an empirical claim about diamonds.
  • We are explicitly interested in solving some forms of problem 2, e.g. we're interested in our AI being able to answer questions about the presence/absence of diamonds no matter how alien the world gets. In some sense, we are interested in our AI answering questions the same way a human would answer questions if they "knew what was really going on", but that "knew what was really going on" might be a misleading phrase. I'm not imagining that "knowing what is really going on" to be a very involved process; intuitively, it means something like "the answer they would give if the sensors are 'working as intended'". In particular, I don't think that, for the case of the diamond, "Further judgement, deliberation, and understanding is required to determine what the answer should be in these strange worlds."
    • We want to solve these versions of problem 2 because the speed "things getting weirder" in the world might be much faster than human ability to understand what's going on the world. In these worlds, we want to leverage the fact that answers to "narrow" questions are unambiguous to incentivize our AIs to give humans a locally understandable environment in which to deliberate.
  • We're not interested in solving forms of problem 2 where the human needs to do additional deliberation to know what the answer to the question "should" be. E.g. in ship-of-theseus situations where the diamond is slowly replaced, we aren't expecting our AI to answer "is that the same diamond as before?" using the resolution of ship-of-theseus style situations that a human would arrive at with additional deliberation. We are, however, expecting that the answer to the question "does the diamond look the way it does because of the 'normal' causal reasons?" is "no" because the reason is something like "[incomprehensible process] replaced bits of the diamond with identical bits slowly", which is definitely not why diamonds normally continue looking the same.
Comment by Mark Xu (mark-xu) on ARC's first technical report: Eliciting Latent Knowledge · 2021-12-21T03:55:42.381Z · LW · GW

Thanks for your proposal! We have considered similar strategies in the past. The main points of the breaker response would be:

  • How are you training this "terrifier"? If the terrifier moves that can be made during training are from held-out sensors that you have constructed, then the "bad terrifier" you might learn doesn't generalize to thinking of any new sensors, only ever considering the sensors that you used as ground truth during training. Thus a sufficiently clever robber might trick all of the sensors-used-during-training (which may not exist during deployment), and thus the diamond would be stolen without a terrifier being able to show any sensor data that would scare the human.
  • It's possible that even if the human was able to know the results of all possible sensors, they still wouldn't be able to know whether the situation was well. For instance, if the human doesn't really understand how humans work, they way that they hope to maintain control over the situation is asking questions like "is that human the same human that I saw before?". However, a terrifier might not have any sensor data that they could show the human to scare them, e.g. if the terrifier showed the human a bunch of nanotech in that humans brain, the human might not understand why that was bad.

For example, if the thieves replace the diamond with an exact copy down to the nanometer but not molecular level... that's fine? I consider that behavior aligned.

This behavior seems potentially catastrophic. If all we can do is force potential adversaries to take actions that can't be seen with the resolution of our current sensors, then e.g. we can't prevent adversaries from corrupting code that is running on computers that we wish to protect.

How this defeats the previous counterexample: in some sense, I see the problem with "ontology mismatch" as the human no longer knowing the right question to ask, because the AI and human no longer agree on an ontology that would make the human's question meaningful. But "what question would the human be maximally upset about if they asked" is a pretty straightforward task to expect from an AI that's capable of modeling (most) human preferences.

I don't really understand why this explains why your strategy defeats the previous counterexample.

Comment by Mark Xu (mark-xu) on ARC's first technical report: Eliciting Latent Knowledge · 2021-12-20T18:25:12.166Z · LW · GW

My point is either that:

  • it will always be possible to find such an experiment for any action, even desirable ones, because the AI will have defended the diamond in a way the human didn't understand or the AI will have deduced some property of diamonds that humans thought they didn't have
  • or there will be some tampering for which it's impossible to find an experiment, because in order to avoid the above problem, you will have to restrict the space of experiments
Comment by Mark Xu (mark-xu) on ARC's first technical report: Eliciting Latent Knowledge · 2021-12-17T18:28:55.809Z · LW · GW

Thanks for your proposal! I'm not sure I understand how the "human is happy with experiment" part is supposed to work. Here are some thoughts:

  • Eventually, it will always be possible to find experiments where the human confidently predicts wrongly. Situations I have in mind are ones where your AI understands the world far better than you, so can predict that e.g. combining these 1000 chemicals will produce self-replicating protein assemblages, whereas the human's best guess is going to be "combining 1000 random chemicals doesn't do anything"
  • If the human is unhappy with experiments that are complicated, then advanced ways of hacking the video feed that requires experiments of comparable complexity to reveal are not going to be permitted. For instance, if the diamond gets replaced by a fake, one might have to perform a complicated imaging technique to determine the difference. If the human doesn't already understand this technique, then they might not be happy with the experiment.
  • If the human doesn't really understand the world that well, then it might not be possible to find an experiment for which the human is confident in the outcome that distinguishes the diamond from a fake. For instance, if a human gets swapped out for a copy of a human that will make subtly different moral judgments because of factors the human doesn't understand, this copy will be identical in all ways that a human can check, e.g. there will be no experiment that a human is confident in that will distinguish the copy of the human from the real thing.
Comment by Mark Xu (mark-xu) on ARC's first technical report: Eliciting Latent Knowledge · 2021-12-15T01:14:19.874Z · LW · GW

We don't think that real humans are likely to be using Bayes nets to model the world. We make this assumption for much the same reasons that we assume models use Bayes nets, namely that it's a test case where we have a good sense of what we want a solution to ELK to look like. We think the arguments given in the report will basically extend to more realistic models of how humans reason (or rather, we aren't aware of a concrete model of how humans reason for which the arguments don't apply).

If you think there's a specific part of the report where the human Bayes net assumption seems crucial, I'd be happy to try to give a more general form of the argument in question.

Comment by Mark Xu (mark-xu) on The Plan · 2021-12-12T18:38:57.536Z · LW · GW

Agreed, but the thing you want to use this for isn’t simulating a long reflection, which will fail (in the worst case) because HCH can’t do certain types of learning efficiently.

Comment by Mark Xu (mark-xu) on The Plan · 2021-12-12T08:40:34.455Z · LW · GW

I want to flag that HCH was never intended to simulate a long reflection. It’s main purpose (which it fails in the worse case) is to let humans be epistemically competitive with the systems you’re trying to train.

Comment by Mark Xu (mark-xu) on Biology-Inspired AGI Timelines: The Trick That Never Works · 2021-12-05T01:37:48.934Z · LW · GW

The way that you would think about NN anchors in my model (caveat that this isn't my whole model):

  • You have some distribution over 2020-FLOPS-equivalent that TAI needs.
  • Algorithmic progress means that 20XX-FLOPS convert to 2020-FLOPS-equivalent at some 1:N ratio.
  • The function from 20XX to the 1:N ratio is relatively predictable, e.g. a "smooth" exponential with respect to time.
  • Therefore, even though current algorithms will hit DMR, the transition to the next algorithm that has less DMR is also predictably going to be some constant ratio better at converting current-FLOPS to 2020-FLOPS-equivalent.

E.g. in (some smallish) parts of my view, you take observations like "AGI will use compute more efficiently than human brains" and can ask questions like "but how much is the efficiency of compute->cognition increasing over time?" and draw that graph and try to extrapolate. Of course, the main trouble is in trying to estimate the original distribution of 2020-FLOPS-equivalent needed for TAI, which might go astray in the way a 1950-watt-equivalent needed for TAI will go astray.

Comment by Mark Xu (mark-xu) on Biology-Inspired AGI Timelines: The Trick That Never Works · 2021-12-03T20:01:26.607Z · LW · GW

My model is something like:

  • For any given algorithm, e.g. SVMs, AlphaGo, alpha-beta pruning, convnets, etc., there is an "effective compute regime" where dumping more compute makes them better. If you go above this regime, you get steep diminishing marginal returns.
  • In the (relatively small) regimes of old algorithms, new algorithms and old algorithms perform similarly. E.g. with small amounts of compute, using AlphaGo instead of alpha-beta pruning doesn't get you that much better performance than like an OOM of compute (I have no idea if this is true, example is more because it conveys the general gist).
  • One of the main way that modern algorithms are better is that they have much large effective compute regimes. The other main way is enabling more effective conversion of compute to performance.
  • Therefore, one of primary impact of new algorithms is to enable performance to continue scaling with compute the same way it did when you had smaller amounts.

In this model, it makes sense to think of the "contribution" of new algorithms as the factor they enable more efficient conversion of compute to performance and count the increased performance because the new algorithms can absorb more compute as primarily hardware progress. I think the studies that Carl cites above are decent evidence that the multiplicative factor of compute -> performance conversion you get from new algorithms is smaller than the historical growth in compute, so it further makes sense to claim that most progress came from compute, even though the algorithms were what "unlocked" the compute.

For an example of something I consider supports this model, see the LSTM versus transformer graphs in https://arxiv.org/pdf/2001.08361.pdf

Comment by Mark Xu (mark-xu) on TurnTrout's shortform feed · 2021-10-01T23:13:20.855Z · LW · GW

https://en.wikipedia.org/wiki/Deadweight_loss#Harberger's_triangle

Comment by Mark Xu (mark-xu) on johnswentworth's Shortform · 2021-10-01T22:20:53.996Z · LW · GW

In general, Baumol type effects (spending decreasing in sectors where productivity goes up), mean that we can have scenarios in which the economy is growing extremely fast on "objective" metrics like energy consumption, but GDP has stagnated because that energy is being spent on extremely marginal increases in goods being bought and sold.

Comment by Mark Xu (mark-xu) on johnswentworth's Shortform · 2021-10-01T22:18:17.249Z · LW · GW

A similar point is made by Korinek in his review of Could Advanced AI Drive Explosive Economic Growth:

My first reaction to the framing of the paper is to ask: growth in what? It’s important to keep in mind that concepts like “gross domestic product” and “world gross domestic product” were defined from an explicit anthropocentric perspective - they measure the total production of final goods within a certain time period. Final goods are what is either consumed by humans (e.g. food or human services) or what is invested into “capital goods” that last for multiple periods (e.g. a server farm) to produce consumption goods for humans.

Now imagine you are a highly intelligent AI system running on the cloud. Although the production of the server farms on which you depend enters into human GDP (as a capital good), most of the things that you absorb, for example energy, server maintenance, etc., count as “intermediate goods” in our anthropocentric accounting systems and do not contribute to human GDP. In fact, to the extent that the AI system drives up the price of scarce resources (like energy) consumed by humans, real human GDP may even decline.

As a result, it is conceivable (and, to be honest, one of the central scenarios for me personally) that an AI take-off occurs but anthropocentric GDP measures show relative stagnation in the human economy.

To make this scenario a bit more tangible, consider the following analogy: imagine a world in which there are two islands trading with each other, but the inhabitants of the islands are very different from each other - let’s call them humans and AIs. The humans sell primitive goods like oil to the AIs and their level of technology is relatively stagnant. The AIs sell amazing services to the humans, and their level of technology doubles every year. However, the AI services that humans consume make up only a relatively small part of the human consumption basket. The humans are amazed at what fantastic services they get from the AIs in exchange for their oil, and they experience improvements in their standard of living from these fantastic AI services, although they also have to pay more and more for their energy use every year, which offsets part of that benefit. The humans can only see what’s happening on their own island and develop a measure of their own well-being that they call human GDP, which increases modestly because the advances only occur in a relatively small part of their consumption basket. The AIs can see what’s going on on the AI island and develop a measure of their own well-being which they call AI GDP, and which almost doubles every year. The system can go on like this indefinitely.

For a fuller discussion of these arguments, let me refer you to my working paper on “The Rise of Artificially Intelligent Agents” (with the caveat that the paper is still a working draft).

Comment by Mark Xu (mark-xu) on Intermittent Distillations #4: Semiconductors, Economics, Intelligence, and Technological Progress. · 2021-07-21T23:14:12.825Z · LW · GW

Yeah that seems like a reasonable example of a good that can't be automated.

I think I'm mostly interested in whether these sorts of goods that seem difficult to automate will be a pragmatic constraint on economic growth. It seems clear that they'll eventually be ultimate binding constraints as long as we don't get massive population growth, but it's a separate question about whether or not they'll start being constraints early enough to prevent rapid AI-driven economic growth.

Comment by Mark Xu (mark-xu) on The topic is not the content · 2021-07-09T20:24:16.949Z · LW · GW

Related: https://forum.effectivealtruism.org/posts/GfWJCF3rqdW48D7k8/be-specific-about-your-career

You might consider cross-posting this to the EA forum.

Comment by Mark Xu (mark-xu) on Intermittent Distillations #4: Semiconductors, Economics, Intelligence, and Technological Progress. · 2021-07-09T19:00:07.871Z · LW · GW

Thanks! I will try, although they will likely stay very intermittent.

Comment by Mark Xu (mark-xu) on rohinmshah's Shortform · 2021-06-26T23:43:57.027Z · LW · GW

My house implemented such a tax.

Re 1, we ran into some of the issues Matthew brought up, but all other COVID policies are implicitly valuing risk at some dollar amount (possibly inconsistently), so the Pigouvian tax seemed like the best option available.

Comment by Mark Xu (mark-xu) on Precognition · 2021-06-15T16:47:39.970Z · LW · GW

I'd be interested to see the rest of this list, if you're willing to share.

Comment by Mark Xu (mark-xu) on Rogue AGI Embodies Valuable Intellectual Property · 2021-06-12T23:34:06.463Z · LW · GW

Yeah, I'm really not sure how the monopoly -> non-monopoly dynamics play out in practice. In theory, perfect competition should drive the cost to the cost of marginal production, which is very low for software. I briefly tried getting empirical data for this, but couldn't find it, plausibly since I didn't really know the right search terms.

Comment by Mark Xu (mark-xu) on An Intuitive Guide to Garrabrant Induction · 2021-06-06T23:02:09.636Z · LW · GW

both of those sections draw from section 7.2 of the original paper

Comment by Mark Xu (mark-xu) on An Intuitive Guide to Garrabrant Induction · 2021-06-05T00:32:58.427Z · LW · GW

Yes, and there will always exist such a trader.

Comment by Mark Xu (mark-xu) on An Intuitive Guide to Garrabrant Induction · 2021-06-03T22:49:45.489Z · LW · GW

Thanks! Should be fixed now.

Comment by Mark Xu (mark-xu) on How refined is your art of note-taking? · 2021-05-20T19:11:47.445Z · LW · GW

It’s based on bullet points, which I find helpful. It also lets me reference other notes I’ve taken.

I like the idea of question notes. Thanks for the tip!

Comment by Mark Xu (mark-xu) on How refined is your art of note-taking? · 2021-05-20T05:45:00.802Z · LW · GW

The particular technology stack I use for notes on reading is {Instapaper, PDF Expert on iPad} -> Readwise -> Roam Research -> Summarize it.

To answer your specific questions:

  1. If I plan on summarizing, I tend to only highlight important bits. I write down any connections I make with other concepts. Readwise reminds me of 15 highlights I've taken in the past per day, which I've been doing for about half a year. I'm not sure if it's helpful, but the time cost is low, so I continue.

  2. Sometimes if I want to know what I thought about specific posts. If it's just high-level concepts, I'll generally just skim the relevant material. If I find myself looking something up more than twice, I'll put it into anki.

  3. No, but I only really study technical things. I find it difficult to summarize/remember history, plausibly because I don't change the way I take notes.

  4. Roam Research seems pretty good. RemNote is similar and incorporates more spaced repetition. SuperMemo allows one to create flashcards as they read (readwise does something similar, but the functionality is worse, I think [I've never used SuperMemo, but plan to try it]).

  5. Attention, future reference, and comprehension are all goals. The primary goal seems to be forcing connections with other ideas and forcing myself to have an opinion about what I'm reading at all.

Comment by Mark Xu (mark-xu) on Pre-Training + Fine-Tuning Favors Deception · 2021-05-09T01:57:58.700Z · LW · GW

thanks, fixed

Comment by mark-xu on [deleted post] 2021-05-01T00:31:18.232Z

Can you be more specific?

Comment by Mark Xu (mark-xu) on AMA: Paul Christiano, alignment researcher · 2021-04-29T17:28:57.245Z · LW · GW

engine-game.com, a game that Paul develops

Comment by Mark Xu (mark-xu) on AMA: Paul Christiano, alignment researcher · 2021-04-29T05:28:40.283Z · LW · GW

How would you teach someone how to get better at the engine game?

Comment by Mark Xu (mark-xu) on AMA: Paul Christiano, alignment researcher · 2021-04-29T02:58:25.341Z · LW · GW

You've written multiple outer alignment failure stories. However, you've also commented that these aren't your best predictions. If you condition on humanity going extinct because of AI, why did it happen?

Comment by Mark Xu (mark-xu) on [Linkpost] Treacherous turns in the wild · 2021-04-27T18:01:31.841Z · LW · GW

This is a cool example, thanks!

Comment by Mark Xu (mark-xu) on Opinions on Interpretable Machine Learning and 70 Summaries of Recent Papers · 2021-04-14T22:52:58.870Z · LW · GW

I'm curious what "put it in my SuperMemo" means. Quick googling only yielded SuperMemo as a language learning tool.

Comment by Mark Xu (mark-xu) on Transparency Trichotomy · 2021-03-28T22:12:11.478Z · LW · GW

I agree it's sort of the same problem under the hood, but I think knowing how you're going to go from "understanding understanding" to producing an understandable model controls what type of understanding you're looking for.

I also agree that this post makes ~0 progress on solving the "hard problem" of transparency, I just think it provides a potentially useful framing and creates a reference for me/others to link to in the future.

Comment by Mark Xu (mark-xu) on Strong Evidence is Common · 2021-03-15T03:13:22.453Z · LW · GW

Yeah, I agree 95% is a bit high.

Comment by Mark Xu (mark-xu) on Open Problems with Myopia · 2021-03-11T23:38:25.355Z · LW · GW

One way of looking at DDT is "keeping it dumb in various ways." I think another way of thinking about is just designing a different sort of agent, which is "dumb" according to us but not really dumb in an intrinsic sense. You can imagine this DDT agent looking at agents that do do acausal trade and thinking they're just sacrificing utility for no reason.

There is some slight awkwardness in that the decision problems agents in this universe actually encounter means that UDT agents will get higher utility than DDT agents.

I agree that the maximum a posterior world doesn't help that much, but I think there is some sense in which "having uncertainty" might be undesirable.

Comment by Mark Xu (mark-xu) on Open Problems with Myopia · 2021-03-11T20:08:15.505Z · LW · GW

has been changed to imitation, as suggested by Evan.

Comment by Mark Xu (mark-xu) on Open Problems with Myopia · 2021-03-10T19:55:39.105Z · LW · GW

Yeah, you're right that it's obviously unsafe. The words "in theory" were meant to gesture at that, but it could be much better worded. Changed to "A prototypical example is a time-limited myopic approval-maximizing agent. In theory, such an agent has some desirable safety properties because a human would only approve safe actions (although we still would consider it unsafe)."

Comment by Mark Xu (mark-xu) on Open Problems with Myopia · 2021-03-10T19:52:10.922Z · LW · GW

Yep - I switched the setup at some point and forgot to switch this sentence. Thanks.

Comment by Mark Xu (mark-xu) on Coincidences are Improbable · 2021-02-25T00:48:50.346Z · LW · GW

This is brilliant.

Comment by Mark Xu (mark-xu) on Coincidences are Improbable · 2021-02-24T19:43:19.545Z · LW · GW

I am using the word "causal" to mean d-connected, which means not d-seperated. I prefer the term "directly causal" to mean A->B or B->A.

In the case of non-effects, the improbable events are "taking Benadryl" and "not reacting after consuming an allergy"

Comment by Mark Xu (mark-xu) on DanielFilan's Shortform Feed · 2021-02-14T21:51:42.945Z · LW · GW

I agree market returns are equal in expectation, but you're exposing. yourself to more risk for the same expected returns in the "I pick stocks" world, so risk-adjusted returns will be lower.

Comment by Mark Xu (mark-xu) on Ways to be more agenty? · 2021-01-05T15:01:04.782Z · LW · GW

I sometimes roleplay as someone role playing as myself, then take the action that I would obviously want to take, e.g. "wow sleeping regularly gives my character +1 INT!" and "using anki every day makes me level up 1% faster!"

Comment by Mark Xu (mark-xu) on Collider bias as a cognitive blindspot? · 2020-12-31T16:10:03.317Z · LW · GW

If X->Z<-Y, then X and Y are independent unless you're conditioning on Z. A relevant TAP might thus be:

  • Trigger: I notice that X and Y seem statistically dependent
  • Action: Ask yourself "what am I conditioning on?". Follow up with "Are any of these factors causally downstream of both X and Y?" Alternatively, you could list salient things causally downstream of either X or Y and check the others.

This TAP unfortunately abstract because "things I'm currently conditioning on" isn't an easy thing to list, but it might help.

Comment by Mark Xu (mark-xu) on Chain Breaking · 2020-12-29T20:21:21.553Z · LW · GW

yep, thanks

Comment by Mark Xu (mark-xu) on Great minds might not think alike · 2020-12-29T16:49:54.845Z · LW · GW

Here are some possibilities:

  • great minds might not think alike
  • untranslated thinking sounds untrustworthy
  • disagreement as a lack of translation