Posts

Heresies in the Shadow of the Sequences 2024-11-14T05:01:11.889Z
Compelling Villains and Coherent Values 2024-10-06T19:53:47.891Z
Cole Wyeth's Shortform 2024-09-28T15:26:37.548Z
Open Problems in AIXI Agent Foundations 2024-09-12T15:38:59.007Z
Free Will and Dodging Anvils: AIXI Off-Policy 2024-08-29T22:42:24.485Z
Sherlockian Abduction Master List 2024-07-11T20:27:00.000Z
Deliberative Cognitive Algorithms as Scaffolding 2024-02-23T17:15:26.424Z
The Byronic Hero Always Loses 2024-02-22T01:31:59.652Z
Gemini 1.5 released 2024-02-15T18:02:50.711Z
Noticing Panic 2024-02-05T03:45:51.794Z
Suggestions for net positive LLM research 2023-12-13T17:29:11.666Z
Is there a hard copy of the sequences available anywhere? 2023-09-11T19:01:54.980Z
Decision theory is not policy theory is not agent theory 2023-09-05T01:38:27.175Z
Mechanistic Interpretability is Being Pursued for the Wrong Reasons 2023-07-04T02:17:10.347Z
A flaw in the A.G.I. Ruin Argument 2023-05-19T19:40:03.135Z
Is "Regularity" another Phlogiston? 2023-03-12T03:13:44.646Z

Comments

Comment by Cole Wyeth (Amyr) on Sam Rosen's Shortform · 2024-12-17T23:49:40.928Z · LW · GW

This concept seems sufficiently useful that it should have a name

Comment by Cole Wyeth (Amyr) on (The) Lightcone is nothing without its people: LW + Lighthaven's big fundraiser · 2024-11-30T15:23:55.814Z · LW · GW

This post provided far more data than I needed to donate to support a site I use constantly.

Comment by Cole Wyeth (Amyr) on mishka's Shortform · 2024-11-25T23:33:32.172Z · LW · GW

One reason to prefer my position is that LLM's still seem to be bad at the kind of tasks that rely on using serial time effectively. For these ML research style tasks, scaling up to human performance over a couple of hours relied on taking the best of multiple calls, which seems like parallel time. That's not the same as leaving an agent running for a couple of hours and seeing it work out something it previously would have been incapable of guessing (or that really couldn't be guessed, but only discovered through interaction). I do struggle to think of tests like this that I'm confident an LLM would fail though. Probably it would have trouble winning a text based RPG? Or more practically speaking, could an LLM file my taxes without committing fraud? How well can LLM's play board games these days?

Comment by Cole Wyeth (Amyr) on Yonatan Cale's Shortform · 2024-11-25T19:06:24.197Z · LW · GW

I think it's net negative - increases the profitability of training better LLM's. 

Comment by Cole Wyeth (Amyr) on Cole Wyeth's Shortform · 2024-11-25T19:04:48.415Z · LW · GW

Over-fascination with beautiful mathematical notation is idol worship. 

Comment by Cole Wyeth (Amyr) on mishka's Shortform · 2024-11-24T22:31:53.684Z · LW · GW

I’d like to see the x-axis on this plot scaled by a couple OOMs on a task that doesn’t saturate: https://metr.org/assets/images/nov-2024-evaluating-llm-r-and-d/score_at_time_budget.png My hunch (and a timeline crux for me) is that human performance actually scales in a qualitatively different way with time, doesn’t just asymptote like LLM performance. And even the LLM scaling with time that we do see is an artifact of careful scaffolding. I am a little surprised to see good performance up to the 2 hour mark though. That’s longer than I expected. Edit: I guess only another doubling or two would be reasonable to expect.

Comment by Cole Wyeth (Amyr) on A few questions about recent developments in EA · 2024-11-23T18:45:13.041Z · LW · GW

Hmmm, my long term strategy is to build wealth and then do it myself, but I suppose that would require me to leave academia eventually :)

I wonder if MIRI would fund it? Doesn't seem likely.

Comment by Cole Wyeth (Amyr) on Rethinking Laplace's Rule of Succession · 2024-11-23T16:53:53.717Z · LW · GW

Are you aware of the existing work on ignorance priors, for instance the maximum entropy prior (if I remember properly this is Jeffrey’s prior and gives rise to the KT estimator), also the improper prior which effectively places almost all of the weight on 0 and 1? Interestingly, the universal distribution does not include continuous parameters but does end up dominating any computable rule for assigning probabilities, including these families of conjugate priors.

Comment by Cole Wyeth (Amyr) on A few questions about recent developments in EA · 2024-11-23T16:25:52.486Z · LW · GW

My intuition is kind of the opposite - I think EA has a less coherent purpose. It's actually kind of a large tent for animal welfare, longtermism, and global poverty. I think some of the divergence in priorities between EA's is about impact assessment / fact finding, and a lot of ink is spilled on this, but some is probably about values too. I think of EA as very outward-facing, coalitional, and ideally a little pragmatic, so I don't think it's a good basis for an organized totalizing worldview. 

The study of human rationality is a more universal project. It makes sense to have a monastic class that (at least for some years of their life) sets aside politics and refines the craft, perhaps functioning as an impersonal interface when they go out into the world - almost like Bene Gesserit advisors (or a Confessor).

I have thought about building it. The physical building itself would be quite expensive, since the monastery would need to meet many psychological requirements - it would have to be both isolated and starkly beautiful. Also, well-provisioned. So this part would be expensive; and its an expense that EA organizations probably couldn't justify (that is, larger and more extravagant than buying a castle). Of course, most of the difficulty would be in creating the culture - but I think that building the monastery properly would go a long way (if you build it, they will come). 

Comment by Cole Wyeth (Amyr) on A few questions about recent developments in EA · 2024-11-23T04:23:33.801Z · LW · GW

I think a really hardcore rationality monastery would be awesome. Seems less useful on the EA side - EA’s have to interact with Overton window occupying institutions and are probably better off not totalizing too much.

Comment by Cole Wyeth (Amyr) on Six Plausible Meta-Ethical Alternatives · 2024-11-20T19:53:34.942Z · LW · GW

I believe 3 is about right in principle but 5 describes humans today. 

Comment by Cole Wyeth (Amyr) on Probability is Real, and Value is Complex · 2024-11-20T19:45:00.711Z · LW · GW

I don't think this proves probability and utility are inextricable. I prefer Jaynes' approach of motivating probabilities by coherence conditions on beliefs - later, he notes that utility and probability are on equal footing in decision theory as explained in this post, but (as far as I remember) ultimately decides that he can't carry this through to a meaningful philosophy that stands on its own. By choosing to introduce probabilities as conceptually prior, he "extricates" the two in a way that seems perfectly sensible to me. 

Comment by Cole Wyeth (Amyr) on The Obliqueness Thesis · 2024-11-20T17:17:32.854Z · LW · GW

I think that at least the weak orthogonality thesis survives these arguments in the sense that any coherent utility function over an ontology "closely matching" reality should in principle be reachable for arbitrarily intelligent agents, along some path of optimization/learning. Your only point that seems to contradict this is the existence of optimization daemons, but I'm confident that an anti-daemon immune system can be designed, so any agent that chooses to design itself in a way where it can be overtaken by daemons must do this with the knowledge that something close to its values will still be optimized for - so this shouldn't cause much observable shift in values. 

It's unclear how much measure is assigned to various "final/limiting" utility functions by various agent construction schemes - I think this is far beyond our current technical ability to answer.

Personally, I suspect that the angle is more like 60 degrees, not 3.  

Comment by Cole Wyeth (Amyr) on Social events with plausible deniability · 2024-11-19T23:01:51.436Z · LW · GW

“Cancel culture is good actually” needs to go in the hat ;)

Comment by Cole Wyeth (Amyr) on Social events with plausible deniability · 2024-11-19T17:32:31.868Z · LW · GW

You may be right that the benefits are worth the costs for some people, but I think if you have access to a group interested in doing social events with plausible deniability, that group is probably already a place where you should be able to be honest about your beliefs without fear of "cancellation." Then it is preferable to practice (and expect) the moral courage / accountability / honesty of saying what you actually believe and defending it within that group. If you don't have a group of people interested in doing social events with plausible deniability, you probably can't do them and this point is mute. So I'm not sure I understand the use case - you have a friend group that is a little cancel-ish but still interested in expressing controversial beliefs? That sounds like something that is not a rationalist group (or maybe I am spoiled by the culture of Jenn's meetups). 

Comment by Cole Wyeth (Amyr) on Social events with plausible deniability · 2024-11-19T02:30:15.084Z · LW · GW

This kind of thing does justified harm to our community’s reputation. If you have fun arguing that only white people can save us while deliberately obfuscating whether you actually believe that, it is in fact a concerning sign about your intentions/seriousness/integrity/trustworthiness.

Comment by Cole Wyeth (Amyr) on "The Solomonoff Prior is Malign" is a special case of a simpler argument · 2024-11-18T01:41:12.637Z · LW · GW

I don’t believe that these anthropic considerations actually apply, either to us, to oracles, or to Solomonoff induction. The arguments are too informal, it’s very easy to miscalculate Kolmogorov complexities and the measures assigned by the universal distribution using intuitive gestures like this. However I do think that this is a correct generalization of the idea of a malign prior, and I actually appreciate that you wrote it up this way because it makes clear that none of the load-bearing parts of the argument actually rely on reliable calculations (invocations of algorithmic information theory concepts have no been reduced to rigorous math, so the original argument is not stronger than this one).

Comment by Cole Wyeth (Amyr) on Heresies in the Shadow of the Sequences · 2024-11-15T15:37:24.375Z · LW · GW

My impression is that e.g. the Catholic church has a pretty deeply thought out moral philosophy that has persisted across generations. That doesn't mean that every individual Catholic understands and executes it properly. 

Comment by Cole Wyeth (Amyr) on Heresies in the Shadow of the Sequences · 2024-11-14T22:44:51.460Z · LW · GW
  • Perhaps Legg-Hutter intelligence.
  • I'm not sure how much the goal matters - probably the details depend on the utility function you want to optimize. I think you can do about as well as possible by carving out a utility function module and designing the rest uniformly to pursue the objectives of that module. But perhaps this comes at a fairly significant cost (i.e. you'd need a somewhat larger computer to get the same performance if you insist on doing it this way).
  • ...And yes, there does exist a computer program which is remarkably good at just chess and nothing else, but that's not the kind of thing I'm talking about here.
  • Yes, the I/O channels should be fixed along with the hardware.
Comment by Cole Wyeth (Amyr) on Heresies in the Shadow of the Sequences · 2024-11-14T20:49:00.916Z · LW · GW

The standard method for training LLM's is next token prediction with teacher-forcing, penalized by the negative log-loss. This is exactly the right setup to elicit calibrated conditional probabilities, and exactly the "prequential problem" that Solomonoff induction was designed for. I don't think this was motivated by decision theory, but it definitely makes perfect sense as an approximation to Bayesian inductive inference - the only missing ingredient is acting to optimize a utility function based on this belief distribution. So I think it's too early to suppose that decision theory won't play a role. 

Comment by Cole Wyeth (Amyr) on Radical Probabilism · 2024-11-11T19:36:16.609Z · LW · GW

What would you have to see proven about Solomonoff induction to conclude it does not have convergence/calibration problems? My friend Aram Ebtekar has worked on showing it converges to 50% on adversarial sequences. 

Comment by Cole Wyeth (Amyr) on Cole Wyeth's Shortform · 2024-11-11T16:37:52.415Z · LW · GW

Perhaps LLM's are starting to approach the intelligence of today's average human: capable of only limited original thought, unable to select and autonomously pursue a nontrivial coherent goal across time, learned almost everything they know from reading the internet ;)

Comment by Cole Wyeth (Amyr) on Cole Wyeth's Shortform · 2024-11-11T12:33:50.852Z · LW · GW

No that seems paywalled, curious though?

Comment by Cole Wyeth (Amyr) on Inverse Problems In Everyday Life · 2024-11-09T19:46:13.458Z · LW · GW

An example I've been studying obsessively: https://www.lesswrong.com/posts/Yz33koDN5uhSEaB6c/sherlockian-abduction-master-list

Comment by Cole Wyeth (Amyr) on Could randomly choosing people to serve as representatives lead to better government? · 2024-11-09T19:32:35.831Z · LW · GW

How do you suggest advocating for this effectively?

Comment by Cole Wyeth (Amyr) on Cole Wyeth's Shortform · 2024-11-09T19:19:27.954Z · LW · GW

I'm in Canada so can't access the latest Claude, so my experience with these things does tend to be a couple months out of date. But I'm not really impressed with models spitting out slightly wrong code that tells me what functions to call. I think this is essentially a more useful search engine. 

Comment by Cole Wyeth (Amyr) on Cole Wyeth's Shortform · 2024-11-09T18:34:29.998Z · LW · GW

I've noticed occasional surprises in that direction, but none of them seem to shake out into utility for me.

Comment by Cole Wyeth (Amyr) on Graceful Degradation · 2024-11-09T18:31:58.243Z · LW · GW

Semi-interestingly, my MMA school taught that it's best for the punch to arrive before the leading foot lands so that the punch carries your full weight. Many people at advanced levels weren't aware of this because we did not introduce it right away - if you try to do this before learning a few other details (and building strength), you run a risk of hurting your wrist by punching too hard. 

Comment by Cole Wyeth (Amyr) on Cole Wyeth's Shortform · 2024-11-09T17:45:08.444Z · LW · GW

I've been waiting to say this until OpenAI's next larger model dropped, but this has now failed to happen for so long that it's become it's own update, and I'd like to state my prediction before it becomes obvious. 

Comment by Cole Wyeth (Amyr) on Cole Wyeth's Shortform · 2024-11-09T17:42:26.120Z · LW · GW

This doesn't seem to be reflected in the general opinion here, but it seems to me that LLM's are plateauing and possibly have already plateaued a year or so ago. Scores on various metrics continue to go up, but this tends to provide weak evidence because they're heavily gained and sometimes leak into the training data. Still, those numbers overall would tend to update me towards short timelines, even with their unreliability taken into account - however, this is outweighed by my personal experience with LLM's. I just don't find them useful for practically anything. I have a pretty consistently correct model of the problems they will be able to help me with and it's not a lot - maybe a broad introduction to a library I'm not familiar with or detecting simple bugs. That model has worked for a year or two without expanding the set much. Also, I don't see any applications to anything economically productive except for fluffy chatbot apps. 

Comment by Cole Wyeth (Amyr) on Survival without dignity · 2024-11-04T15:02:43.288Z · LW · GW

I think this is a story about anthropic immortality.

Comment by Cole Wyeth (Amyr) on Sherlockian Abduction Master List · 2024-11-03T23:22:42.256Z · LW · GW

Thanks! I am particularly interested on the hook grip calluses on thumbs, I'll look into that.

Calluses at the base of the finger (say, the knuckle-joint of the palm) are in my experience very difficult to classify. I get them there by climbing as you said, and though I also get some calluses on my fingers those tend to be less persistent and probably disappear most of the time (after climbing for awhile at my level of intensity I stop getting calluses). I have also seen them from biking - when I started out I used to look at people's palms a lot and never came up with a reliable way to distinguish this from weightlifting. But if you could go into some more detail on the differences, perhaps I'll add a more speculative entry and see how it stands up!

(If it's your first post on lesswrong, welcome! I think you'll find that kindness/politeness is the community norm here) 

Comment by Cole Wyeth (Amyr) on Sherlockian Abduction Master List · 2024-11-02T20:08:16.903Z · LW · GW

I haven't been able to verify that protestants don't wear a cross on a chain - it seems like they prefer an empty cross to the more catholic-coded crucifix, but this doesn't seem to be what you meant?

Comment by Cole Wyeth (Amyr) on New intro textbook on AIXI · 2024-10-27T18:37:13.927Z · LW · GW

Technically the connection between the computability levels of AIT (estimability, lower/upper semi-computability, approximability) and the Turing degrees has not been worked out properly. See chapter 6 of Leike's thesis, though there is a small error in the inequalities of section 6.1.2. It is necessary to connect the computability of real valued functions (type two theory of effectivity) to the arithmetic hierarchy - as far as I know this hasn't been done, but maybe I'll share some notes in a few months. 

Roughly, most classes don't have a universal distribution because they are not computably enumerable, but perhaps there are various reasons. There's a nice table in Marcus Hutter's original book, page 50.

It says that (negative log) universal probability is about the same as the (monotone) Kolmogorov complexity - in the discrete case up to a constant multiple. Basically, the Bayesian prediction is closely connected to the shortest explanation. See Li and Vitanyi's "An Introduction to Kolmogorov Complexity and its Applications."

Last question is a longer story I guess. Basically, the conditionals of the universal distribution are not lower semi-computable, and it gets even worse when you have to compare the expected values of different outcomes because of tie-breaking. But a good approximation of AIXI can still be computed in the limit.

Comment by Cole Wyeth (Amyr) on New intro textbook on AIXI · 2024-10-27T15:50:59.792Z · LW · GW

Nice things about the universal distribution underlying AIXI include:

  • It is one (lower semi-)computable probabilistic model that dominates in the measure-theoretic sense all other (lower semi-)computable probabilistic models. This is not possible to construct for most natural computability levels, so its neat that it works.
  • Unites compression and prediction through the coding theorem - though this is slightly weaker in the sequential case.
  • It has two very natural characterizations, either as feeding random bits to a UTM or as an explicit mixture of lower semi-computable environments.

With the full AIXI model, Professor Hutter was able to formally extend the probabilistic model to interactive environments without damaging the computability level. Conditioning and planning do damage the computability level but this is fairly well understood and not too bad.

Comment by Cole Wyeth (Amyr) on Cole Wyeth's Shortform · 2024-10-25T15:01:27.541Z · LW · GW

I'm starting a google group for anyone who wants to see occasional updates on my Sherlockian Abduction Master List. It occurred to me that anyone interested in the project would currently have to check the list to see any new observational cues (infrequently) added - also some people outside of lesswrong are interested. 

Comment by Cole Wyeth (Amyr) on Interest in Leetcode, but for Rationality? · 2024-10-17T02:04:36.182Z · LW · GW

I would be very interested to see what you come up with!

Comment by Cole Wyeth (Amyr) on Rashomon - A newsbetting site · 2024-10-16T13:16:28.706Z · LW · GW

Would be nice to be able to try it out without signing up

Comment by Cole Wyeth (Amyr) on Shortform · 2024-10-08T19:23:24.187Z · LW · GW

I think it's mostly about elite outreach. If you already have a sophisticated model of the situation you shouldn't update too much on it, but it's a reasonably clear signal (for outsiders) that x-risk from A.I. is a credible concern.

Comment by Cole Wyeth (Amyr) on Overview of strong human intelligence amplification methods · 2024-10-08T17:29:55.232Z · LW · GW

Personally I'm unlikely to increase my neuron-neuron bandwidth anytime soon, sounds like a very risky intervention even if possible.

Comment by Cole Wyeth (Amyr) on Overview of strong human intelligence amplification methods · 2024-10-08T17:04:46.212Z · LW · GW

My guess is that it would be very hard to get to millions of connections, so maybe we agree, but I'm curious if you have more specific info. Why is it not the bottleneck though?

I'm not a neuroscientist / cognitive scientist, but my impression is that rapid eye movements are already much faster than my conscious deliberation. Intuitively, this means there's already a lot of potential communication / control / measurement bandwidth left on the table. There is definitely a point beyond which you can't increase human intelligence without effectively adding more densely connected neurons or uploading and increasing clock speed. Honestly I don't think I'm equipped to go deeper into the details here. 

You're talking about a handful of people, so the benefit can't be that large.

I'm not sure I agree with either part of this sentence. If we had some really excellent intelligence augmentation software built into AR glasses we might boost on the order of thousands of people. Also I think the top 0.1% of people contribute a large chunk of economic productivity - say on the order of >5%.  

Comment by Cole Wyeth (Amyr) on Overview of strong human intelligence amplification methods · 2024-10-08T16:36:21.052Z · LW · GW

I think there's a reasonable chance everything you said is true, except:

What you're actually doing is doing the 5% boost, and never doing the other stuff.

I intend to do the other stuff after finishing my PhD - though its not guaranteed I'll follow through. 

The next paragraph is low confidence because it is outside of my area of expertise (I work on agent foundations, not neuroscience):

The problem with neuralink etc. is that they're trying to solve the bandwith problem which is not currently the bottleneck and will take too long to yield any benefits. A full neural lace is maybe similar to a technical solution to alignment in the sense that we won't get either within 20 years at our current intelligence levels. Also, I am not in a position where I have enough confidence in my sanity and intelligence metrics to tamper with my brain by injecting neurons into it and stuff. On the other hand, even minor non-invasive general fluid intelligence increase at the top of the intelligence distribution would be incredibly valuable and profits could be reinvested in more hardcore augmentation down the line. I'd be interested to here where you disagree with this. 

It almost goes without saying that if you can make substantial progress on the hardcore approaches that would be much, much more valuable than what I am suggesting, and I encourage you to try.

Comment by Cole Wyeth (Amyr) on Overview of strong human intelligence amplification methods · 2024-10-08T16:11:26.450Z · LW · GW

I think I'm more optimistic about starting with relatively weak intelligence augmentation. For now, I test my fluid intelligence at various times throughout the day (I'm working on better tests justified by algorithmic information theory in the style of Prof Hernandez-Orallo, like this one but it sucks to take https://github.com/mathemajician/AIQ but for now I use my own here: https://github.com/ColeWyeth/Brain-Training-Game), and I correlate the results with everything else I track about my lifestyle using reflect: https://apps.apple.com/ca/app/reflect-track-anything/id6463800032 which I endorse, though I should note it's owned/invented by a couple of my friends/former coworkers. I'll post some intermediate results soon. Obviously this kind of approach alone will probably only provide a low single digit IQ boost at most, but I think it makes sense to pick the low-hanging fruit first (then attempt incrementally harder stuff with the benefit of being slightly smarter). Also, accurate metrics and data collection should be established as early as possible. Ultimately I want to strap some AR goggles on and measure my fluid intelligence in real time ideally from eye movements in response to some subconscious stimulation (haven't vetted the plausibility of this idea at all). 

Comment by Cole Wyeth (Amyr) on A Narrow Path: a plan to deal with AI extinction risk · 2024-10-07T15:54:21.059Z · LW · GW

The executive summary seems essentially right to me. My only objection is that Phase 4 should probably be human intelligence augmentation.

Comment by Cole Wyeth (Amyr) on Compelling Villains and Coherent Values · 2024-10-07T15:48:35.676Z · LW · GW

You raise an interesting point about virtue ethics - I don't think that is required for moral coherence, I think it is just a shortcut. A consequentialist must be prepared to evaluate ~all outcomes to approach moral coherence, but a virtue ethicist really only needs to evaluate their own actions, which is much easier. 

Comment by Cole Wyeth (Amyr) on Cole Wyeth's Shortform · 2024-10-06T20:14:02.589Z · LW · GW

Presented the Sherlockian abduction master list at a Socratica node:

Image
Comment by Cole Wyeth (Amyr) on Sherlockian Abduction Master List · 2024-10-06T20:10:32.802Z · LW · GW

Presented this list and idea at a Socratica node:
 

Image
Image
Image
Comment by Cole Wyeth (Amyr) on Why I’m not a Bayesian · 2024-10-06T19:25:45.473Z · LW · GW

Verbal statements often have context dependent or poorly defined truth value, but observations are pretty (not completely) solid. Since useful models eventually shake out into observations, the binary truth values tagging observations "propagate back" through probability theory to make useful statements about models. I am not convinced that we need a fuzzier framework - though I am interested in the philosophical justification for probability theory in the "unrealizable" case where no element of the hypothesis class is true. For instance, it seems that universal distributions mixture is over probabilistic models none of which should necessarily be assumed true, but rather only the widest class we can compute. 

Comment by Cole Wyeth (Amyr) on Mark Xu's Shortform · 2024-10-05T18:22:20.751Z · LW · GW

Improving computer security seems possible but there are many other attack vectors. For instance, even if an A.I. can prove a system’s software is secure, it may choose to introduce social engineering style back doors if it is not aligned. It’s true that controlled A.I.‘s can be used to harden society but overall I don’t find that strategy comforting.

I’m not convinced that this induction argument goes through. I think it fails on the first generation that is smarter than humans, for basically Yudkowskian reasons.

Comment by Cole Wyeth (Amyr) on Mark Xu's Shortform · 2024-10-05T16:17:28.996Z · LW · GW

Imagine that there are just a few labs with powerful A.I., all of which are responsible enough to use existing A.I. control strategies which have been prepared for this situation, and none of which open source their models. Now if they successfully use their A.I. for alignment, they will also be able to successfully use it for capabilities research. At some point, control techniques will no longer be sufficient, and we have to hope that by then A.I. aided alignment has succeeded enough to prevent bad outcomes. I don’t believe this is a serious possibility; the first A.I. capable of solving the alignment problem completely will also be able to deceive us about solving the alignment problem (more) easily - up to and including this point, A.I. will produce partial, convincing solutions to the alignment problem which human engineers will go forward with. Control techniques will simply threshold (below) the capabilities of the first unaligned A.I. that escapes, which is plausibly a net negative since it means we won’t have early high impact warnings. If occasional A.I. escapes turn out to be non-lethal, economic incentives will favor better A.I. control, so working on this early won’t really matter. If occasional A.I. escapes turn out to be lethal, then we will die unless we solve the alignment problem ourselves.