[Market] Will AI xrisk seem to be handled seriously by the end of 2026? 2023-05-25T18:51:49.184Z
Horizontal vs vertical generality 2023-04-29T19:14:35.632Z
Core of AI projections from first principles: Attempt 1 2023-04-11T17:24:27.686Z
Is this true? @tyler_m_john: [If we had started using CFCs earlier, we would have ended most life on the planet] 2023-04-10T14:22:07.230Z
Is this true? paulg: [One special thing about AI risk is that people who understand AI well are more worried than people who understand it poorly] 2023-04-01T11:59:45.038Z
What does the economy do? 2023-03-24T10:49:33.251Z
Are robotics bottlenecked on hardware or software? 2023-03-21T07:26:52.896Z
What problems do African-Americans face? An initial investigation using Standpoint Epistemology and Surveys 2023-03-12T11:42:32.614Z
What do you think is wrong with rationalist culture? 2023-03-10T13:17:28.279Z
What are MIRI's big achievements in AI alignment? 2023-03-07T21:30:58.935Z
🤔 Coordination explosion before intelligence explosion...? 2023-03-05T20:48:55.995Z
Prediction market: Will John Wentworth's Gears of Aging series hold up in 2033? 2023-02-25T20:15:11.535Z
Somewhat against "just update all the way" 2023-02-19T10:49:20.604Z
Latent variables for prediction markets: motivation, technical guide, and design considerations 2023-02-12T17:54:33.045Z
How many of these jobs will have a 15% or more drop in employment plausibly attributable to AI by 2031? 2023-02-12T15:40:02.999Z
Do IQ tests measure intelligence? - A prediction market on my future beliefs about the topic 2023-02-04T11:19:29.163Z
What is a disagreement you have around AI safety? 2023-01-12T16:58:10.479Z
Latent variable prediction markets mockup + designer request 2023-01-08T22:18:36.050Z
Where do you get your capabilities from? 2022-12-29T11:39:05.449Z
Price's equation for neural networks 2022-12-21T13:09:16.527Z
Will Manifold Markets/Metaculus have built-in support for reflective latent variables by 2025? 2022-12-10T13:55:18.604Z
How difficult is it for countries to change their school curriculum? 2022-12-03T21:44:56.830Z
Is school good or bad? 2022-12-03T13:14:22.737Z
Is there some reason LLMs haven't seen broader use? 2022-11-16T20:04:48.473Z
Will nanotech/biotech be what leads to AI doom? 2022-11-15T17:38:18.699Z
Musings on the appropriate targets for standards 2022-11-12T20:19:38.939Z
Internalizing the damage of bad-acting partners creates incentives for due diligence 2022-11-11T20:57:41.504Z
Instrumental convergence is what makes general intelligence possible 2022-11-11T16:38:14.390Z
Have you noticed any ways that rationalists differ? [Brainstorming session] 2022-10-23T11:32:13.368Z
The highest-probability outcome can be out of distribution 2022-10-22T20:00:16.233Z
AI Research Program Prediction Markets 2022-10-20T13:42:55.113Z
Is the meaning of words chosen/interpreted to maximize correlations with other relevant queries? 2022-10-20T10:03:19.931Z
Towards a comprehensive study of potential psychological causes of the ordinary range of variation of affective gender identity in males 2022-10-12T21:10:46.440Z
What sorts of preparations ought I do in case of further escalation in Ukraine? 2022-10-01T16:44:58.046Z
Resources to find/register the rationalists that specialize in a given topic? 2022-09-29T17:20:19.752Z
Renormalization: Why Bigger is Simpler 2022-09-14T17:52:50.088Z
Are smart people's personal experiences biased against general intelligence? 2022-04-21T19:25:26.603Z
If everything is genetic, then nothing is genetic - Understanding the phenotypic null hypothesis 2022-04-20T16:25:26.323Z
The Scale Problem in AI 2022-04-19T17:46:19.969Z
Anticorrelated Noise Injection for Improved Generalization 2022-02-20T10:15:32.276Z
Framing Practicum: General Factor 2022-02-14T16:35:30.705Z
Some thoughts on "The Nature of Counterfactuals" 2022-01-16T18:12:46.865Z
Reductionism is not the ultimate tool for causal inference 2022-01-07T21:51:22.837Z
Causality and determinism in social science - An investigation using Pearl's causal ladder 2022-01-03T17:51:54.021Z
More accurate models can be worse 2021-12-28T12:20:44.010Z
Random facts can come back to bite you 2021-12-22T17:33:49.023Z
Transforming myopic optimization to ordinary optimization - Do we want to seek convergence for myopic optimization problems? 2021-12-11T20:38:46.604Z
Information bottleneck for counterfactual corrigibility 2021-12-06T17:11:12.984Z
Apparently winning by the bias of your opponents 2021-11-28T13:20:16.284Z
Stop button: towards a causal solution 2021-11-12T19:09:45.967Z


Comment by tailcalled on Open Thread With Experimental Feature: Reactions · 2023-06-02T13:20:53.919Z · LW · GW

Have you looked into whether you are ADHD?

Comment by tailcalled on Yudkowsky vs Hanson on FOOM: Whose Predictions Were Better? · 2023-06-02T11:37:26.918Z · LW · GW

I'd say usually bottlenecks aren't absolute, but instead quantifiable and flexible based on costs, time, etc.?

One could say that we've reached the threshold where we're bottlenecked on inference-compute, whereas previously talk of compute bottlenecks was about training-compute.

This seems to matter for some FOOM scenarios since e.g. it limits the FOOM that can be achieved by self-duplicating.

But the fact that AI companies are trying their hardest to scale up compute, and are also actively researching more compute-efficient algorithms, means IMO that the inference-compute bottleneck will be short-lived.

Comment by tailcalled on Short Remark on the (subjective) mathematical 'naturalness' of the Nanda--Lieberum addition modulo 113 algorithm · 2023-06-02T10:18:10.410Z · LW · GW

Here's another way of looking at it which could be said to make it more trivial:

We can transform addition into multiplication by taking the exponential, i.e. x+y=z is equivalent to 10^x * 10^y = 10^z.

But if we unfold the digits into separate axes rather than as a single number, then 10^n is just a one-hot encoding of the integer n.

Taking the Fourier transform of the digits to do convolutions is a well-known fast multiplication algorithm.

Comment by tailcalled on Short Remark on the (subjective) mathematical 'naturalness' of the Nanda--Lieberum addition modulo 113 algorithm · 2023-06-02T10:10:59.703Z · LW · GW

Is this really an accurate analogy? I feel like clock arithmetic would be more like representing it as a rotation matrix, not a Fourier basis.

Comment by tailcalled on Yudkowsky vs Hanson on FOOM: Whose Predictions Were Better? · 2023-06-01T21:51:36.626Z · LW · GW
Comment by tailcalled on Reacts now randomly enabled on 50% of posts (you can still manually change this yourself in the post settings) · 2023-05-29T16:02:06.856Z · LW · GW

Likely too complex and unwieldy to be implemented in practice or make sense as a react system, but I thought I would mention it just in case:

Some of the reacts, such as "I already addressed this", could possibly benefit from some sort of "pointing" functionality, selecting the part(s) of the discussion where one addressed it. Similarly, "Please elaborate" could possibly benefit from selecting the part(s) that one wants elaborated.

Comment by tailcalled on Reacts now randomly enabled on 50% of posts (you can still manually change this yourself in the post settings) · 2023-05-29T11:01:19.001Z · LW · GW

Maybe some reacts, such as "Will reply later", should not support negative reacts.

Comment by tailcalled on Reacts now randomly enabled on 50% of posts (you can still manually change this yourself in the post settings) · 2023-05-28T10:01:05.880Z · LW · GW

Idea: to address this issue of reacts potentially leading to less texty responses in an unconfounded way, maybe for a period of time during later experiments you could randomly enable reacts on half of all new posts?

Might be silly though. At least it's not very worthwhile without a measure of how well it goes. Potentially total amount of text written in discussions could function as such a measure, but it seems kind of crude.

Comment by tailcalled on [Market] Will AI xrisk seem to be handled seriously by the end of 2026? · 2023-05-26T07:06:36.179Z · LW · GW

No, by the "If my opinion on the inherent danger of AI xrisk changes during the resolution period, I will try to respond based on the level of risk implied by my criteria, not based on my later evaluation of things." rule, but maybe in such a case I would change the title to reflect the relevant criteria.

Comment by tailcalled on Open Thread With Experimental Feature: Reactions · 2023-05-25T16:42:13.540Z · LW · GW

Another potential option: the person who is getting reacted to should have an option to request explanations for the reacts, and if requested, providing such explanations should receive bonus karma or something.

Comment by tailcalled on Open Thread With Experimental Feature: Reactions · 2023-05-25T16:37:41.689Z · LW · GW

I agree that this is a potential downside. However:

  • I think this has the potential to elicit more information rather than reducing the maount of information, if the use of reacts by lurkers sufficiently exceeds the downgrading of comments to reacts by non-lurkers.
  • I think this has the potential to improve social norms on LessWrong by providing a neat way for people to express directions of desired change. Social norms aren't always about providing precise information, but instead also often about adjusting broader behaviors.

But I agree that this is potentially concerning enough that it should probably be tracked and that I think LessWrong should be ready to drop it again if it turns out bad.

Galaxybrained idea: use this system to incentivize more detailed texts by allowing people to get custom reacts made for posts that explain some problematic dynamic, and have the reacts link back to that post.

Comment by tailcalled on Open Thread With Experimental Feature: Reactions · 2023-05-25T11:52:08.970Z · LW · GW

Another suggestion, maybe a "too verbose" reaction? And a "too abstract" reaction?

Comment by tailcalled on Open Thread With Experimental Feature: Reactions · 2023-05-25T08:52:37.259Z · LW · GW

I think a reaction which suggests that something is mislabelled might also be helpful. Like for reacting to misleading titles (... can people react to top-level posts?) or similar.

Comment by tailcalled on Bayesian Networks Aren't Necessarily Causal · 2023-05-21T19:13:13.606Z · LW · GW

I think this approach only gets the direction of the arrows from two structures, which I'll call colliders and instrumental variables (because that's what they are usually called).

Colliders are the case of A -> B <- C, which in terms of correlations shows up as A and B being correlated, B and C being correlated, and A and C being independent. This is a distinct pattern of correlations from the A -> B -> C or A <- B -> C structures where all three could be correlated, so it is possible for this method to distinguish the structures (well, sometimes not, but that's tangential to my point).

Instrumental variables are the case of A -> B -> C, where A -> B is known but the direction of B - C is unknown. In that case, the fact that C correlates with A suggests that B -> C rather than B <- C.

I think the main advantage larger causal networks give you is that they give you more opportunities to apply these structures?

But I see two issues with them. First, they don't see to work very well in nondeterministic cases. They both rely on the correlation between A and C, and they both need to distinguish whether that correlation is 0 or  (where  and  refer to the effects A - B and B - C) respectively. If the effects in your causal network are of order , then you are basically trying to distinguish something of order 0 from something of order , which is likely going to be hard if  is small. (The smaller of a difference you are trying to detect, the more affected you are going to be by model misspecification, unobserved confounders, measurement error, etc..) This is not a problem in Zack's case because his effects are near-deterministic, but it would be a problem in other cases. (I in particularly have various social science applications in mind.)

Secondly, Zack's example had an advantage that multiple root causes of wet sidewalks were measured. This gave him a collider to kick off the inference process. (Though I actually suspect this to be unrealistic - wouldn't you be less likely to turn on the sprinkler if it's raining? But that relationship would be much less deterministic, so I suspect that's OK in this case where it's less deterministic.) But this seems very much like a luxury that often doesn't occur in the practical cases where I've seen people attempt to apply this. (Again various social science applications.)

Comment by tailcalled on Bayesian Networks Aren't Necessarily Causal · 2023-05-21T11:19:21.902Z · LW · GW

I don't think this is much better than just linking up variables to each other if they are strongly correlated (at least in ways not explained by existing links)?

Comment by tailcalled on Matthew Barnett's Shortform · 2023-05-16T11:19:42.573Z · LW · GW

Not clear to me what capabilities the AIs have compared to the humans in various steps in your story or where they got those capabilities from.

Comment by tailcalled on Bayesian Networks Aren't Necessarily Causal · 2023-05-14T21:32:49.013Z · LW · GW

Causality is useful mainly insofar as different instances can be compactly described as different simple interventions on the same Bayes net.

Thinking about this algorithmically: In e.g. factor analysis, after performing PCA to reduce a high-dimensional dataset to a low-dimensional one, it's common to use varimax to "rotate" the principal components so that each resulting axis has a sparse relationship with the original indicator variables (each "principal" component correlating only with one indicator). However, this instead seems to suggest that one should rotate them so that the resulting axes have a sparse relationship with the original cases (each data point deviating from the mean on as few "principal" components as possible).

I believe that this sort of rotation (without the PCA) has actually been used in certain causal inference algorithms, but as far as I can tell it basically assumes that causality flows from variables with higher kurtosis to variables with lower kurtosis, which admittedly seems plausible for a lot of cases, but also seems like it consistently gives the wrong results if you've got certain nonlinear/thresholding effects (which seem plausible in some of the areas I've been looking to apply it).

Not sure whether you'd say I'm thinking about this right? 

For instance, in the sprinkler system, some days the sprinkler is in fact turned off, or there's a tarp up, or what have you, and then the system is well-modeled by a simple intervention.

I'm trying to think of why modelling this using a simple intervention is superior to modelling it as e.g. a conditional. One answer I could come up with is if there's some correlations across the different instances of the system, e.g. seasonable variation in rain or similar, or turning the sprinkler on partway through a day. Though these sorts of correlations are probably best modelled by expanding the Bayesian network to include time or similar.

Comment by tailcalled on Towards Measures of Optimisation · 2023-05-13T09:40:07.072Z · LW · GW

Another way of looking at this is that only ratios of differences in utility are real.

I suspect real-world systems replace argmax with something like softmax, and in that case the absolute scale of the utility function becomes meaningful too (representing the scale at which it even bothers optimizing).

Comment by tailcalled on I bet $500 on AI winning the IMO gold medal by 2026 · 2023-05-11T19:49:00.414Z · LW · GW

I think math is "easy" in the sense that we have proof assistants that can verify proofs so AIs can learn it through pure self-play. Therefore I agree that AI will probably soon solve math, but I disagree that it indicates particularly high capabilities gain.

Comment by tailcalled on LeCun’s “A Path Towards Autonomous Machine Intelligence” has an unsolved technical alignment problem · 2023-05-08T20:33:22.922Z · LW · GW

Nice, I was actually just thinking that someone needed to respond to LeCun's proposal.

That said, I think you may have gotten some of the details wrong. I don't think the intrinsic cost module gets raw sensory data as input, but instead it gets input from the latent variables of the world model as well as the self-supervised perception module. This complicates some of the safety problems you suggest.

Comment by tailcalled on Solomonoff’s solipsism · 2023-05-08T07:24:47.884Z · LW · GW

See Thou Art Physics.

Comment by tailcalled on Statistical models & the irrelevance of rare exceptions · 2023-05-07T17:55:00.587Z · LW · GW

Could you give an example of a dispute that follows the structure you are talking about?

Also, as a counterpoint, there's the arguments made for determinism in Science in a High-Dimensional World.

Comment by tailcalled on Systems that cannot be unsafe cannot be safe · 2023-05-02T19:28:39.939Z · LW · GW

For the first point, if "people can in fact recognize some types of unsafety," then it's not the case that "you don't even have a clear idea of what would constitute unsafe." And as I said in another comment, I think this is trying to argue about standards, which is a necessity in practice for companies that want to release systems, but isn't what makes the central point, which is the title of the post, true.

Maybe I am misunderstanding what you mean by "have a clear idea of what would constitute unsafe"?

Taking rods as an example, my understanding is that rods might be used to support some massive objects, and if the rods bend under the load then they might release the objects and cause harm. So the rods need to be strong enough to support the objects, and usually rods are sold with strength guarantees to achieve this.

"If it would fail under this specific load, then it is unsafe" is a clear idea of what would constitute unsafe. I don't think we have this clear of an idea for AI. We have some vague ideas of things that would be undesirable, but there tends to be a wide range of potential triggers and a wide range of potential outcomes, which seem more easily handled by some sort of adversarial setup than by writing down a clean logical description. But maybe when you say "clear idea", you don't necessarily mean a clean logical description, and also consider more vague descriptions to be relevant?

And I agree that rods are often simple, and the reason that I chose rods as an example is because people have an intuitive understanding of some of the characteristics you care about. But the same conceptual model, however, applies to cars, where there is tons of specific safety testing with clearly defined standards, despite the fact that their behavior can be very, very complex.

I already addressed cars and you said we should talk about rods. Then I addressed rods and you want to switch back to cars. Can you make up your mind?

Comment by tailcalled on Systems that cannot be unsafe cannot be safe · 2023-05-02T13:10:34.971Z · LW · GW

I'm not saying that a standard is sufficient for safety, just that it's incoherent to talk about safety if you don't even have a clear idea of what would constitute unsafe.

I can believe it makes it less definitive and less useful, but I don't buy that it makes it "meaningless" and entirely "incoherent". People can in fact recognize some types of unsafety, and adversarially try to trigger unsafety. I would think that the easier it is to turn GPT into some aggressive powerful thing, the more likely ARC would have been to catch it, so ARCs failure to make GPT do dangerous stuff would seem to constitute Bayesian evidence that it is hard to make it do dangerous stuff.

Also, I wasn't talking about cars in particular - every type of engineering, including software engineering, follows this type of procedure for verification and validation, when those are required. And I think metal rods are a better example to think about - we don't know what it is going to be used for when it is made, but whatever application the rod will be used for, it needs to have some clear standards and requirements.

AFAIK rods are a sufficiently simple artifact that almost all of their behavior can be described using very little information, unlike cars and GPTs?

Comment by tailcalled on Systems that cannot be unsafe cannot be safe · 2023-05-02T10:31:29.563Z · LW · GW

You're making the assumption that the safety methods for cars are appropriate to transfer directly to e.g. LLMs. That's not clearly true to me as there are strong differences in the nature of cars vs the nature of LLMs. For instance the purposes and capacities of cars are known in great detail (driving people from place to place), whereas the purposes of LLMs are not known (we just noticed that they could do a lot of neat things and assumed someone will find a use-case for them) and their capabilities are much broader and less clear.

I would be concerned that your proposed safety method would become very prone to Goodhearting.

Comment by tailcalled on Horizontal vs vertical generality · 2023-05-01T08:24:34.433Z · LW · GW

That seems reasonable to me.

Comment by tailcalled on Horizontal vs vertical generality · 2023-04-30T13:15:17.805Z · LW · GW

I think there's a distinction between the environment being in ~equillibrium and you wrestling a resource out from the equllibrium, versus you being part of a greater entity which wrestles resources out from the equillibrium and funnels them to your part?

Comment by tailcalled on Horizontal vs vertical generality · 2023-04-30T08:39:34.534Z · LW · GW

In your framework, self-improving AI is vertically general (since it can do everything necessary for the task of AI R&D)

It might actually not be, it's sort of hard to be vertically general.

An AI needs electricity and hardware. If it gets its electricity by its human creators and needs its human creators to actively choose to maintain its hardware, then those are necessary subtasks in AI R&D which it can't solve itself.

I think it makes sense to distinguish between a self-improving AI which can handle contract negotiations etc. in order to earn the money needed to make an income and buy electricity and hire people to handle its hardware, vs an AI that must be owned in order to achieve this.

That said a self-improving AI may still be more vertically general than other things. I think it's sort of a continuum.

Even though this list isn't very long, lacking these abilities greatly decreases the horizontal generality of the AI.

One thing that is special about self-improving AIs is that they are, well, self-improving. So presumably they either increase their horizontal generality, their vertical generality, or their cost-efficiency over time (or more likely, increase a combination of them).

Comment by tailcalled on Horizontal vs vertical generality · 2023-04-30T07:53:15.588Z · LW · GW

I like to literally imagine a big list of tasks, along the lines of:

  1. Invent and deploy a new profitable AI system
  2. Build a skyscraper
  3. Form a cat, which catches and eats mice, mates and raises kittens
  4. etc.

An operationalization of horizontal generality would then be the number of tasks on the list that something can contribute to. For instance restricting ourselves to the first three items, a cat has horizontal generality 1, a calculator has horizontal generality 2, and a superintelligence has generality 3.

Within each task, we can then think of various subtasks that are necessary to complete it, e.g. for building a skyscraper, you need land, permissions, etc., and then you need to dig, set up stuff, pour concrete, etc. (I don't know much about skyscrapers, can you tell? 😅). Each of these subtasks need some physical interventions (which we ignore because this is about intelligence, though they may be relevant for evaluating the generality of robotics rather than of intelligence) and some cognitive processing. The fraction of the required cognitive subtasks that can be performed by an entity within a task is its vertical generality (within that specific task).

Comment by tailcalled on [SEE EDIT] No, *You* Need to Write Clearer · 2023-04-29T09:47:48.588Z · LW · GW

This post made me start wondering - have the shard theorists written posts about what they think is the most dangerous realistic alignment failure?

Comment by tailcalled on AI chatbots don't know why they did it · 2023-04-27T20:52:55.946Z · LW · GW

Hmm upon reflection I might just have misremembered how GPTs work.

Comment by tailcalled on AI chatbots don't know why they did it · 2023-04-27T16:54:38.569Z · LW · GW

I agree that it likely confabulates explanations.

Comment by tailcalled on AI chatbots don't know why they did it · 2023-04-27T08:27:16.921Z · LW · GW

How can I be sure of this? Based on how the API works, we know that a chatbot has no short-term memory.[2] The only thing a chatbot remembers is what it wrote. It forgets its thought process immediately.

Isn't this false? I thought the chatbot is deterministic (other than sampling tokens from the final output probabilities) and that every layer of the transformer is autoregressive, so if you add a question to an existing conversation, it will reconstruct its thoughts underlying the initial conversation, and then allow those thoughts to affect its answers to your followup questions.

Comment by tailcalled on How Many Bits Of Optimization Can One Bit Of Observation Unlock? · 2023-04-26T08:35:18.501Z · LW · GW

Channel capacity between goals and outcomes is sort of a counterintuitive measure of optimization power to me, and so if there are any resources on it, I would appreciate them.

Comment by tailcalled on How did LW update p(doom) after LLMs blew up? · 2023-04-22T21:30:16.649Z · LW · GW

I basically agree that LLMs don't seem all that inherently dangerous and am somewhat confused about rationalists' reaction to them. LLMs seem to have some inherent limitations.

That said, I could buy that they could become dangerous/accelerate timelines. To understand my concern, let's consider a key distinction in general intelligence: horizontal generality vs vertical generality.

  • By horizontal generality, I mean the ability to contribute to many different tasks. LLMs supersede or augment search engines in being able to funnel information from many different places on the internet right to a person who needs it. Since the internet contains information about many different things, this is often useful.
  • By vertical generality, I mean the ability to efficiently complete tasks with minimal outside assistance. LLMs do poorly on this, as they lack agency, actuators, sensors and probably also various other things needed to be vertically general.

(You might think horizontal vs vertical generality is related to breadth vs depth of knowledge, but I don't think it is. The key distinction is that breadth vs depth of knowledge concerns fields of information, whereas horizontal vs vertical generality concerns tasks. Inputs vs outputs. Some tasks may depend on multiple fields of knowledge, e.g. software development depends on programming capabilities and understanding user needs, which means that depth of knowledge doesn't guarantee vertical generality. On the other hand, some fields of knowledge, e.g. math or conflict resolution, may give gains in multiple tasks, which means that horizontal generality doesn't require breadth of knowledge.)

While we have had previous techniques like AlphaStar with powerful vertical generality, they required a lot of data from those domains they functioned in in order to be useful, and they do not readily generalize to other domains.

Meanwhile, LLMs have powerful horizontal generality, and so people are integrating them into all sorts of places. But I can't help but wonder - I think the integration of LLMs in various places will develop their vertical generality, partly by giving them access to more data, and partly by incentivizing people to develop programmatic scaffolding which increases their vertical generality.

So LLMs getting integrated everywhere may incentivize removing their limitations and speeding up AGI development.

Comment by tailcalled on How did LW update p(doom) after LLMs blew up? · 2023-04-22T21:17:26.553Z · LW · GW

I suspect it to be worth distinguishing cults from delusional ideologies. As far as I can tell, it is common for ideologies to have inelastic false poorly founded beliefs; the classical example is belief in the supernatural. I'm not sure what the exact line between cultishness and delusion is, but I suspect that it's often useful to define cultishness as something like treating opposing ideologies as infohazards. While rationalists are probably guilty of this, the areas where they are guilty of it doesn't seem to be p(doom) or LLMs, so it might not be informative to focus cultishness accusations on that.

Comment by tailcalled on Human level AI can plausibly take over the world · 2023-04-20T11:09:44.493Z · LW · GW

Counterpoint: before this happens, the low-hanging fruit will be grabbed by similar strategies that are dumber and have lower aspirations.

Comment by tailcalled on Consequentialism is in the Stars not Ourselves · 2023-04-19T22:17:48.899Z · LW · GW

I think you might also be discounting what's being selected on. You wrote:

Whereas I think I could easily do orders of magnitude more in a day.

You can do orders of magnitude more opimization power to something on some criterion. But evolution's evaluation function is much higher quality than yours. It evaluates the success of a complex organism in a complex environment, which is very complex to evaluate and is relevant to deep things (such as discovering intelligence). In a day, you are not able to do 75 bits of selection on cognitive architectures being good for producing intelligence.

I agree that this is an important distinction and didn't mean to imply that my selection is on criteria that are as difficult as evolution's.

Comment by tailcalled on Deceptive Alignment is <1% Likely by Default · 2023-04-19T21:12:19.565Z · LW · GW

I'm explicitly not addressing other failure modes in this post. 

Yes, I know, I gave the other failure modes as an example. The thing that confuses me is that you are saying that the (IMO) central piece in the AI algorithm doesn't really matter for the purposes of your post.

What are you referring to as the program here? Is it the code produced by the AI that is being evaluated by people who don't know how to code?


Why would underqualified evaluators result in an ulterior motive? And to make it more specific to this post, why would that cause the base goal understanding to come later than goal directedness and around the same time as situational awareness and a very long-term goal?

It's not meant as an example of deceptive misalignment, it's meant as an example of how alignment failures by-default depend absolutely massively on the way you train your AI. Like if you train your AI in a different way, you get different failures. So it seems like a strange prior to me to assume that you will get the same results wrt deceptive alignment regardless of how you train it.

Comment by tailcalled on Consequentialism is in the Stars not Ourselves · 2023-04-19T20:11:01.468Z · LW · GW

E.g. if only 1/2^n organisms reach sexual maturity, that's n bits / generation right there.

I think the child mortality rate used to be something like 50%, which is 1 bit/generation.

If you include gametes being selected (e.g. which sperm swim the fastest), that could be a bunch more bits.

I think an ejaculation involves 100000000 sperm, which corresponds to 26.5 bits, assuming the very "best" sperm is selected (which itself seems unlikely, I feel like surely there's a ton of noise).

If every man is either a loser or Genghis Khan, that's another 10-ish bits.

I would be curious to know how you calculated this, but sure.

This gives us 37.5 bits so far. Presumably there's also a bunch of other bits due to e.g. not all pregnancies making it, etc.. Let's be conservative and say that we've only got half the bits in this count, so the total would be 75 bits.

That's... not very much? Like I think I can easily write 1000 lines of code in a day, where each LOC would probably contain more than 75 bits worth of information. So I could easily 1000x exceed the selection power of evolution, in a single day worth of programming.

In artificial breeding, there can be even more bits / generation.

Somewhat more, but not hugely more I think? And regardless, humanity doesn't originate from artificial breeding.

Comment by tailcalled on Deceptive Alignment is <1% Likely by Default · 2023-04-19T15:09:01.241Z · LW · GW

That the specific ways people give feedback isn't very relevant. It seems like the core thing determining the failure modes to me, e.g. if you just show the people who give feedback the source code of the program, then the failure mode will be that often the program will just immediately crash or maybe not even compile. Meanwhile if you show people the running program then that cannot be a failure mode.

If you agree that generally the failure modes are determined by the feedback process but somehow deceptive misalignment is an exception to the rule that the feedback process determines the failures then I don't see the justification for that and would like that addressed explicitly.

Comment by tailcalled on Deceptive Alignment is <1% Likely by Default · 2023-04-19T14:24:15.739Z · LW · GW

Does that answer your question?

Yes but then I disagree with the assumptions underlying your post and expect things that are based on your post to be derailed by the errors that have been introduced.

Comment by tailcalled on Green goo is plausible · 2023-04-19T12:54:20.961Z · LW · GW

I agree that one could do something similar with other tech than neat biotech, but I don't think this proves that Kudzugoth Alignment is as difficult as general alignment. I think aligning AI to achieve something specific is likely to be a lot easier than aligning AI in general. It's questionable whether the latter is even possible and unclear what it means to achieve it.

Comment by tailcalled on Consequentialism is in the Stars not Ourselves · 2023-04-19T08:03:31.475Z · LW · GW

Can you list some examples of durable/macroscale effects you have in mind?

Comment by tailcalled on Consequentialism is in the Stars not Ourselves · 2023-04-19T07:51:51.524Z · LW · GW

I might be wrong but I think evolution only does a smallish number of bits worth of selection per generation? Whereas I think I could easily do orders of magnitude more in a day.

Comment by tailcalled on Green goo is plausible · 2023-04-19T07:39:07.650Z · LW · GW

Basically if artificial superintelligence happens before sufficiently advanced synthetic biology, then one way to frame the alignment problem is "how do we make an ASI create a nice kudzugoth instead of a bad kudzugoth?".

Comment by tailcalled on Consequentialism is in the Stars not Ourselves · 2023-04-19T07:36:39.318Z · LW · GW

Most of the work done by (mechanistic) consequentialism happens in the outer selection process that produced a system, not in the system so selected.

I think "work done" is the wrong metric for many practical purposes (rewards wastefulness) and one should instead focus on "optimization achieved" or something.

Comment by tailcalled on Green goo is plausible · 2023-04-19T06:23:12.869Z · LW · GW

Without major breakthroughs (Artificial Superintelligence) there's no meaningful "alignment plan", just a scientific discipline. There's no sense in which you can really "align" an AI system to do this.

Do you expect humanity to bioengineer this before we develop artificial superintelligence? If not, presumably this objection is irrelevant.

Comment by tailcalled on Green goo is plausible · 2023-04-18T21:08:59.716Z · LW · GW

So I'd really like to have some idea how to build a machine that teaches a plant to do something like a safe, human-compatible version of this.

🤔 This is actually a path to progress, right? The difficulty in alignment is figuring out what we want precisely enough that we can make an AI do it. It seems like a feasible research project to map this out for kudzugoth.

Seems convincing enough that I'm gonna make a Discord and maybe switch to this as a project. Come join me at Kudzugoth Alignment Center! ... 😅 I might close again quickly if the plan turns out to be fatally flawed, but until then, here we go.

Comment by tailcalled on Deceptive Alignment is <1% Likely by Default · 2023-04-18T12:13:54.470Z · LW · GW

I'm not so much asking the question of what the tasks are, and instead asking what exactly the setup would be.

For example, if I understand the paper Cotra linked to correctly, they directly showed the raters what the model's output was and asked them to rate it. Is this also the feedback mode you are assuming in your post?

For example in order to train an AI to do advanced software development, would you show unspecialized workers in India how the model describes it would edit the code? If not, what feedback signal are you assuming?