Posts

H5N1. Just how bad is the situation? 2023-07-08T22:09:33.928Z
Three levels of exploration and intelligence 2023-03-16T10:55:02.196Z
Maybe you can learn exotic experiences via analytical thought 2023-01-20T01:50:48.938Z
Motivated Cognition and the Multiverse of Truth 2022-11-22T12:51:26.405Z
Method of statements: an alternative to taboo 2022-11-16T10:57:49.937Z
The importance of studying subjective experience 2022-10-21T08:43:25.191Z
What if human reasoning is anti-inductive? 2022-10-11T05:15:49.373Z
Statistics for objects with shared identities 2022-10-03T09:21:15.884Z
About Q Home 2022-09-28T04:56:44.586Z
Probabilistic reasoning for description and experience 2022-09-27T10:57:06.217Z
Should AI learn human values, human norms or something else? 2022-09-17T06:19:16.482Z
Ideas of the Gaps 2022-09-13T10:55:47.772Z
Can you force a neural network to keep generalizing? 2022-09-12T10:14:27.181Z
Can "Reward Economics" solve AI Alignment? 2022-09-07T07:58:49.397Z
Informal semantics and Orders 2022-08-27T04:17:09.327Z
Vague concepts, family resemblance and cluster properties 2022-08-20T10:21:31.475Z
Statistics for vague concepts and "Colors" of places 2022-08-19T10:33:53.540Z
Q Home's Shortform 2022-08-16T23:52:08.773Z
Content generation. Where do we draw the line? 2022-08-09T10:51:37.446Z
Thinking without priors? 2022-08-02T09:17:45.622Z
Relationship between subjective experience and intelligence? 2022-07-24T09:10:10.494Z
Inward and outward steelmanning 2022-07-14T23:32:47.452Z

Comments

Comment by Q Home on Being nicer than Clippy · 2024-04-30T08:35:37.616Z · LW · GW

I'm noticing two things:

  1. It's suspicious to me that values of humans-who-like-paperclips are inherently tied to acquiring an unlimited amount of resources (no matter in which way). Maybe I don't treat such values as 100% innocent, so I'm OK keeping them in check. Though we can come up with thought experiments where the urge to get more resources is justified by something. Like, maybe instead of producing paperclips those people want to calculate Busy Beaver numbers, so they want more and more computronium for that.
  2. How consensual were the trades if their outcome is predictable and other groups of people don't agree with the outcome? Looks like coercion.
Comment by Q Home on Examples of Highly Counterfactual Discoveries? · 2024-04-24T06:00:46.888Z · LW · GW

Often I see people dismiss the things the Epicureans got right with an appeal to their lack of the scientific method, which has always seemed a bit backwards to me.

The most important thing, I think, is not even hitting the nail on the head, but knowing (i.e. really acknowledging) that a nail can be hit in multiple places. If you know that, the rest is just a matter of testing.

Comment by Q Home on Why I no longer identify as transhumanist · 2024-02-06T09:50:24.619Z · LW · GW

But avoidance of value drift or of unendorsed long term instability of one's personality is less obvious.

What if endorsed long term instability leads to negation of personal identity too? (That's something I thought about.)

Comment by Q Home on AI #27: Portents of Gemini · 2023-12-05T23:33:34.947Z · LW · GW

I think corrigibility is the ability to change a value/goal system. That the literal meaning of the term... "Correctable". If an AI were fully aligned, there would be no need to correct it.

Perhaps I should make a better argument:

It's possible that AGI is correctable, but (a) we don't know what needs to be corrected or (b) we cause new, less noticeable problems, while correcting AGI.

So, I think there's not two assumptions "alignment/interpretability is not solved + AGI is incorrigible", but only one — "alignment/interpretability is not solved". (A strong version of corrigibility counts as alignment/interpretability being solved.)

Yes, and that's the specific argument I am addressing,not AI risk in general. Except that if it's many many times smarter, it's ASI, not AGI.

I disagree that "doom" and "AGI going ASI very fast" are certain (> 90%) too.

Comment by Q Home on AI #27: Portents of Gemini · 2023-12-04T22:41:26.433Z · LW · GW

It's not aligned at every possible point in time.

I think corrigibility is "AGI doesn't try to kill everyone and doesn't try to prevent/manipulate its modification". Therefore, in some global sense such AGI is aligned at every point in time. Even if it causes a local disaster.

Over 90% , as I said

Then I agree, thank you for re-explaining your opinion. But I think other probabilities count as high too.

To me, the ingredients of danger (but not "> 90%") are those:

  • 1st. AGI can be built without Alignment/Interpretability being solved. If that's true, building AGI slowly or being able to fix visible problems may not matter that much.
  • 2nd and 3rd. AGI can have planning ability. AGI can come up with the goal pursuing which would kill everyone.
  • 2nd (alternative). AIs and AGIs can kill most humans without real intention of doing so, by destabilizing the world/amplifying already existing risks.

If I remember correctly, Eliezer also believes in "intelligence explosion" (AGI won't be just smarter than humanity, but many-many times smarter than humanity: like humanity is smarter than ants/rats/chimps). Haven't you forgot to add that assumption?

Comment by Q Home on AI #27: Portents of Gemini · 2023-12-04T02:15:15.674Z · LW · GW

why is “superintelligence + misalignment” highly conjunctive?

In the sense that matters, it needs to be fast, surreptitious, incorrigible, etc.

What opinion are you currently arguing? That the risk is below 90% or something else? What counts as "high probability" for you?

Incorrigible misalignment is at least one extra assumption.

I think "corrigible misalignment" doesn't exist, corrigble AGI is already aligned (unless AGI can kill everyone very fast by pure accident). But we can have differently defined terms. To avoid confusion, please give examples of scenarios you're thinking about. The examples can be very abstract.

If AGI is AGI, there won’t be any problems to notice

Huh?

I mean, you haven't explained what "problems" you're talking about. AGI suddenly declaring "I think killing humans is good, actually" after looking aligned for 1 year? If you didn't understand my response, a more respectful answer than "Huh?" would be to clarify your own statement. What noticeable problems did you talk about in the first place?

Please, proactively describe your opinions. Is it too hard to do? Conversation takes two people.

Comment by Q Home on AI #27: Portents of Gemini · 2023-12-03T00:10:05.933Z · LW · GW

I've confused you with people who deny that a misaligned AGI is even capable of killing most humans. Glad to be wrong about you.

But I am not saying that the doom is unlikely given superintelligence and misalignment, I am saying the argument that gets there -- superintelligence + misalignment -- is highly conjunctive. The final step., the execution as it were, is no highly conjunctive.

But I don't agree that it's highly conjunctive.

  • If AGI is possible, then its superintelligence is a given. Superintelligence isn't given only if AGI stops at human level of intelligence + can't think much faster than humans + can't integrate abilities of narrow AIs naturally. (I.e. if AGI is basically just a simulation of a human and has no natural advantages.) I think most people don't believe in such AGI.
  • I don't think misalignment is highly conjunctive.

I agree that hard takeoff is highly conjunctive, but why is "superintelligence + misalignment" highly conjunctive?

I think its needed for the "likely". Slow takeoff gives humans more time to notice and fix problems, so the likelihood of bad outcomes goes down. Wasn't that obvious?

If AGI is AGI, there won't be any problems to notice. That's why I think probability doesn't decrease enough.

...

I hope that Alignment is much easier to solve than it seems. But I'm not sure (a) how much weight to put into my own opinion and (b) how much my probability of being right decreases the risk.

Comment by Q Home on AI #27: Portents of Gemini · 2023-12-01T22:29:28.858Z · LW · GW

Yes, I probably mean something other than ">90%".

[lists of various catastrophes. many of which have nothing to do with AI]

Why are you doing this? I did not say there is zero risk of anything. (...) Are you using "risk" to mean the probability of the outcome , or the impact of the outcome?

My argument is based on comparing the phenomenon of AGI to other dangerous phenomena. The argument is intended to show that bad outcome is likely (if AGI wants to do a bad thing, it can achieve it) and that impact of the outcome can kill most humans.

I think its needed for the "likely". Slow takeoff gives humans more time to notice and fix problems, so the likelihood of bad outcomes goes down. Wasn't that obvious?

To me the likelihood doesn't go down enough (to the tolerable levels).

Comment by Q Home on AI #27: Portents of Gemini · 2023-11-27T05:01:51.965Z · LW · GW

Informal logic is more holistic than not, I think, because it relies on implicit assumptions.

It's not black and white. I don't think they are zero risk, and I don't think it is Certain Doom, so it's not what I am talking about. Why are you bringing it up? Do you think there is a simpler argument for Certain Doom?

Could you proactively describe your opinion? Or re-describe it, by adding relevant details. You seemed to say "if hard takeoff, then likely doom; but hard takeoff is unlikely, because hard takeoff requires a conjunction of things to be true". I answered that I don't think hard takeoff is required. You didn't explain that part of your opinion. Now it seems your opinion is more general (not focused on hard takeoff), but you refuse to clarify it. So, what is the actual opinion I'm supposed to argue with? I won't try to use every word against you, so feel free to write more.

Doom meaning what? It's obvious that there is some level of risk, but some level of risk isn't Certain Doom. Certain Doom is an extraordinary claim,and the burden of proof therefore is on (certain) doomers. But you seem to be switching between different definitions.

I think "AGI is possible" or "AGI can achieve extraordinary things" is the extraordinary claim. The worry about its possible extraordinary danger is natural. Therefore, I think AGI optimists bear the burden of proving that a) likely risk of AGI is bounded by something and b) AGI can't amplify already existing dangers.

By "likely doom" I mean likely (near-)extinction. "Likely" doesn't have to be 90%.

Saying “the most dangerous technology with the worst safety and the worst potential to control it” doesn't actually imply a high level of doom (p>9) or a high level of risk (> 90% dead)-- it's only a relative statement.

I think it does imply so, modulo "p > 90%". Here's a list of the most dangerous phenomena: (L1)

  • Nuclear warfare. World wars.
  • An evil and/or suicidal world-leader.
  • Deadly pandemics.
  • Crazy ideologies, e.g. fascism. Misinformation. Addictions. People being divided on everything. (Problems of people's minds.)

And a list of the most dangerous qualities: (L2)

  • Being superintelligent.
  • Wanting, planning to kill everyone.
  • Having a cult-following. Humanity being dependent on you.
  • Having direct killing power (like a deadly pandemic or a set of atomic bombs).
  • Multiplicity/simultaneity. E.g. if we had TWO suicidal world-leaders at the same time.

Things from L1 can barely scrap two points from L2, yet they can cause mass disruptions and claim many victims and also trigger each other. Narrow AI could secure three points from the list (narrow superintelligence + cult-following, dependency + multiplicity/simultaneity) — weakly, but potentially better than a powerful human ever could. However, AGI can easily secure three points from L3 in full. Four points, if AGI is developed more than in a single place. And I expect you to grant that general superintelligence presents a special, unpredictable danger.

Given that, I don't see what should bound the risk from AGI or prevent it from amplifying already existing dangers.

Comment by Q Home on AI #27: Portents of Gemini · 2023-11-26T02:11:32.707Z · LW · GW

Why ? I'm saying p(doom) is not high. I didn't mention P(otherstuff).

To be able to argue something (/decide how to go about arguing something), I need to have an idea about your overall beliefs.

That doesn't imply a high probability of mass extinction.

Could you clarify what your own opinion even is? You seem to agree that rapid self-improvement would mean likely doom. But you aren't worried about gradual self-improvement or AGI being dangerously smart without much (self-)improvement?

Comment by Q Home on AI #27: Portents of Gemini · 2023-11-25T01:09:52.357Z · LW · GW

I think I have already answered that: I don't think anyone is going to deliberately build something they can't control at all. So the probability of mass extinction depends on creating an uncontrollable superintelligence accidentally-- for instance, by rapid recursive self improvement. And RRSI , AKA Foom Doom, is a conjunction of claims, all of which are p<1, so it is not high probability.

I agree that probability mostly depends on accidental AGI. I don't agree that probability mostly depends on (very) hard takeoff. I believe probability mostly depends on just "AGI being smarter than all of humanity". If you have a kill-switch or whatever, an AGI without Alignment theory being solved is still "the most dangerous technology with the worst safety and the worst potential to control it".

So, could you go into more cruxes of your beliefs, more context? (More or less full context of my own beliefs is captured by the previous comment. But I'm ready to provide more if needed.) To provide more context to your beliefs, you could try answering "what's the worst disaster (below everyone being dead) an AGI is likely to cause" or "what's the best benefit an AGI is likely to give". To make sure you aren't treating an AGI as impotent in negative scenarios and as a messiah in positive scenarios. Or not treating humans as incapable of sinking even a safe non-sentient boat and refusing to vaccinate from viruses.

Comment by Q Home on AI #27: Portents of Gemini · 2023-11-24T02:30:01.406Z · LW · GW

I want to discuss this topic with you iff you're ready to proactively describe the cruxes of your own beliefs. I believe in likely doom and I don't think the burden of proof is on "doomers".

Maybe there just isn't a good argument for Certain Doom (or at least high probability near-extinction). I haven't seen one

What do you expect to happen when you're building uninterpretable technology without safety guarantees, smarter than all of humanity? Looks like the most dangerous technology with the worst safety and the worst potential to control it.

To me, those abstract considerations are enough a) to conclude likely doom and b) to justify common folk in blocking AI capability research — if common folk could do so.

I believe experts should have accountability (even before a disaster happens) and owe some explanation of what they're doing. If an expert is saying "I'm building the most impactful technology without safety but that's suddenly OK this time around because... ... I can't say, you need to be an expert to understand", I think it's OK to not accept the answer and block the research.

Comment by Q Home on [Bias] Restricting freedom is more harmful than it seems · 2023-11-23T07:43:38.899Z · LW · GW

You are correct that critical thinkers may want to censor uncritical thinkers. However, independent-minded thinkers do not want to censor conventional-minded thinkers.

I still don't see it. Don't see a causal mechanism that would cause it. Even if we replace "independent-minded" with "independent-minded and valuing independent-mindedness for everyone". I have the same problems with it as Ninety-Three and Raphael Harth.

To give my own example. Algorithms in social media could be a little too good at radicalizing and connecting people with crazy opinions, such as flat earth. A person censoring such algorithms/their output could be motivated by the desire to make people more independent-minded.

I deliberately avoided examples for the same reason Paul Graham's What You Can't Say deliberately avoids giving any specific examples: because either my examples would be mild and weak (and therefore poor illustrations) or they'd be so shocking (to most people) they'd derail the whole conversation. (comment)

I think the value of a general point can only stem from re-evaluating specific opinions. Therefore, sooner or later the conversation has to tackle specific opinions.

If "derailment" is impossible to avoid, then "derailment" is a part of the general point. Or there are more important points to be discussed. For example, if you can't explain to cave people General Relativity, maybe you should explain "science" and "language" first — and maybe those tangents are actually more valuable than General Relativity.

I dislike Graham's essay for the same reason: when Graham does introduce some general opinions ("morality is like fashion", "censuring is motivated by the fear of free-thinking", "there's no prize for figuring out quickly", "a statement can't be worse than false"), they're not discussed critically, with examples. Re:say looks weird to me. Invisible opponents are allowed to say only one sentence and each sentence gets a lengthy "answer" with more opinions.

Comment by Q Home on [Bias] Restricting freedom is more harmful than it seems · 2023-11-22T10:47:25.365Z · LW · GW

We only censor other people more-independent-minded than ourselves. (...) Independent-minded people do not censor conventional-minded people.

I'm not sure that's true. Not sure I can interpret the "independent/dependent" distinction.

  • In "weirdos/normies" case, a weirdo can want to censor ideas of normies. For example, some weirdos in my country want to censor LGBTQ+ stuff. They already do.
  • In "critical thinkers/uncritical thinkers" case, people with more critical thinking may want to censor uncritical thinkers. (I believe so.) For example, LW in particular has a couple of ways to censor someone, direct and indirect.

In general, I like your approach of writing this post like an "informal theorem".

Comment by Q Home on It's OK to be biased towards humans · 2023-11-20T10:16:49.738Z · LW · GW

I tried to describe necessary conditions which are needed for society and culture to exist. Do you agree that what I've described are necessary conditions?

I realize I'm pretty unusual in the regard, which may be biasing my views. However, I think I am possibly evidence against the notion that a desire to leave a mark on the culture is fundamental to human identity

Relevant part of my argument was "if your personality gets limitlessly copied and modified, your personality doesn't exist (in the cultural sense)". You're talking about something different, you're talking about ambitions and desire of fame.


My thesis (to not lose the thread of the conversation):

If human culture and society are natural, then the rights about information are natural too, because culture/society can't exist without them.

Comment by Q Home on It's OK to be biased towards humans · 2023-11-20T08:38:37.190Z · LW · GW

I think we can just judge by the consequences (here "consequences" don't have to refer to utility calculus). If some way of "injecting" art into culture is too disruptive, we can decide to not allow it. Doesn't matter who or how makes the injection.

Comment by Q Home on It's OK to be biased towards humans · 2023-11-20T08:28:54.118Z · LW · GW

To exist — not only for itself, but for others — a consciousness needs a way to leave an imprint on the world. An imprint which could be recognized as conscious. Similar thing with personality. For any kind of personality to exist, that personality should be able to leave an imprint on the world. An imprint which could be recognized as belonging to an individual.

Uncontrollable content generation can, in principle, undermine the possibility of consciousness to be "visible" and undermine the possibility of any kind of personality/individuality. And without those things we can't have any culture or society expect a hivemind.

Are you OK with such disintegration of culture and society?

In general, I think people have a right to hear other people, but not a right to be heard.

To me that's very repugnant, if taken to the absolute. What emotions and values motivate this conclusion? My own conclusions are motivated by caring about culture and society.


Alternatively, it could be the case that the artist has more to say that isn't or can't be expressed by the imitations- other ideas, interesting self expression, and so on- but the imitations prevent people from finding that new work. I think that case is a failure of whatever means people are using to filter and find art. A good social media algorithm or friend group who recommend content to each other should recognize that the inventor of an good idea might invent other good ideas in the future, and should keep an eye out for and platform those ideas if they do.

I was going for something slightly more subtle. Self-expression is about making a choice. If all choices are realized before you have a chance to make them, your ability to express yourself is undermined.

Comment by Q Home on It's OK to be biased towards humans · 2023-11-20T00:56:35.991Z · LW · GW

Thank you for the answer, clarifies your opinion a lot!

Artistic expression, of course, is something very different. I'm definitely going to keep making art in my spare time for the rest of my life, for the sake of fun and because there are ideas I really want to get out. That's not threatened at all by AI.

I think there are some threats, at least hypothetical. For example, the "spam attack". People see that a painter starts to explore some very niche topic — and thousands of people start to generate thousands of paintings about the same very niche topic. And the very niche topic gets "pruned" in a matter of days, long before the painter has said at least 30% of what they have to say. The painter has to fade into obscurity or radically reinvent themselves after every couple of paintings. (Pre-AI the "spam attack" is not really possible even if you have zero copyright laws.)

In general, I believe for culture to exist we need to respect the idea "there's a certain kind of output I can get only from a certain person, even if it means waiting or not having every single of my desires fulfilled" in some way. For example, maybe you shouldn't use AI to "steal" a face of an actor and make them play whatever you want.

Do you think that unethical ways to produce content exist at least in principle? Would you consider any boundary for content production, codified or not, to be a zero-sum competition?

Comment by Q Home on It's OK to be biased towards humans · 2023-11-19T10:03:00.041Z · LW · GW

Maybe I've misunderstood your reply, but I wanted to say that hypothetically even humans can produce art in non-cooperative and disruptive ways, without breaking existing laws.

Imagine a silly hypothetical: one of the best human artists gets a time machine and starts offering their art for free. That artist functions like an image generator. Is such an artist doing something morally questionable? I would say yes.

Comment by Q Home on It's OK to be biased towards humans · 2023-11-19T09:31:13.686Z · LW · GW

Could you explain your attitudes towards art and art culture more in depth and explain how exactly your opinions on AI art follow from those attitudes? For example, how much do you enjoy making art and how conditional is that enjoyment? How much do you care about self-expression, in what way? I'm asking because this analogy jumped out at me as a little suspicious:

And as terrible as this could be for my career, spending my life working in a job that could be automated but isn't would be as soul-crushing as being paid to dig holes and fill them in again. It would be an insultingly transparent facsimile of useful work.

But creative work is not mechanical work, it can't be automated that way, AI doesn't replace you that way. AI doesn't have the model of your brain, it can't make the choices you would make. It replaces you by making something cheaper and on the same level of "quality". It doesn't automate your self-expression. If you care about self-expression, the possibility of AI doesn't have to feel soul-crushing.

I apologize for sounding confrontational. You're free to disagree with everything above. I just wanted to show that the question has a lot of potential nuances.

Comment by Q Home on It's OK to be biased towards humans · 2023-11-19T08:51:22.433Z · LW · GW

I like the angle you've explored. Humans are allowed to care about humans — and propagate that caring beyond its most direct implications. We're allowed to care not only about humans' survival, but also about human art and human communication and so on.

But I think another angle is also relevant: there are just cooperative and non-cooperative ways to create art (or any other output). If AI creates art in non-cooperative ways, it doesn't matter how the algorithm works or if it's sentient or not.

Comment by Q Home on It's OK to be biased towards humans · 2023-11-12T07:21:47.626Z · LW · GW

Thus, it doesn't matter in the least if it stifles human output, because the overwhelming majority of us who don't rely on our artistic talent to make a living will benefit from a post-scarcity situation for good art, as customized and niche as we care to demand.

How do you know that? Art is one of the biggest outlets of human potential; one of the biggest forces behind human culture and human communities; one of the biggest communication channels between people.

One doesn't need to be a professional artist to care about all that.

Comment by Q Home on Open Thread – Autumn 2023 · 2023-11-06T11:10:52.819Z · LW · GW

I think you're going for the most trivial interpretation instead of trying to explore interesting/unique aspects of the setup. (Not implying any blame. And those "interesting" aspects may not actually exist.) I'm not good at math, but not that bad to not know the most basic 101 idea of multiplying utilities by probabilities.

I'm trying to construct a situation (X) where the normal logic of probability breaks down, because each possibility is embodied by a real person and all those persons are in a conflict with each other.

Maybe it's impossible to construct such situation, for example because any normal situation can be modeled the same way (different people in different worlds who don't care about each other or even hate each other). But the possibility of such situation is an interesting topic we could explore.

Here's another attempt to construct "situation X":

  • We have 100 persons.
  • 1 person has 99% chance to get big reward and 1% chance to get nothing. If they drink.
  • 99 persons each have 0.0001% chance to get big punishment and 99.9999% chance to get nothing.

Should a person drink? The answer "yes" is a policy which will always lead to exploiting 99 persons for the sake of 1 person. If all those persons hate each other, their implicit agreement to such policy seems strange.


Here's an explanation of what I'd like to explore from another angle.

Imagine I have a 99% chance to get reward and 1% chance to get punishment. If I take a pill. I'll take the pill. If we imagine that each possibility is a separate person, this decision can be interpreted in two ways:

  • 1 person altruistically sacrifices their well-being for the sake of 99 other persons.
  • 100 persons each think, egoistically, "I can get lucky". Only 1 person is mistaken.

And the same is true for other situations involving probability. But is there any situation (X) which could differentiate between "altruistic" and "egoistic" interpretations?

Comment by Q Home on Open Thread – Autumn 2023 · 2023-11-05T23:01:07.767Z · LW · GW

For all intents and purposes it's equivalent to say "you have only one shot" and after memory erasure it's not you anymore, but a person equivalent to other version of you next room.

Let's assume "it's not you anymore" is false. At least for a moment (even if it goes against LDT or something else).

Yes, you have a 0.1 chance of being punished. But who cares if they will erase your memory anyway.

Let's assume that the persons do care.

Comment by Q Home on Petrov Day Retrospective, 2023 (re: the most important virtue of Petrov Day & unilaterally promoting it) · 2023-09-28T06:52:16.776Z · LW · GW

To me, the initial poll options make no sense without each other. For example, "avoid danger" and "communicate beliefs" don't make sense without each other [in context of society].

If people can't communicate (report epistemic state), "avoid danger" may not help or be based on 100% biased opinions on what's dangerous.

  • If some people solve Alignment, but don't communicate, humanity may perish due to not building a safe AGI.
  • If nobody solves Alignment, but nobody communicates about Alignment, humanity may perish because careless actors build an unsafe AGI without even knowing they do something dangerous.

I like communication, so I chose the second option. Even though "communicating without avoiding danger" doesn't make sense either.

Since the poll options didn't make much sense to me, I didn't see myself as "facing alien values" or "fighting off babyeaters". I didn't press the link, because I thought it may "blow up" the site (similar to the previous Petrov's Day) + I wasn't sure it's OK to click, I didn't think my unilateralism would be analogous to Petrov's unilateralism (did Petrov cure anyone's values, by the way?). I decided it's more Petrov-like to not click.


But is AGI (or anything else) related to the lessons of Petrov's Day? That's another can of worms. I think we should update the lessons of the past to fit the future situations. I think it doesn't make much sense to take away from Petrov's Day only lessons about "how to deal with launching nukes".

Another consideration: Petrov did accurately report his epistemic state. Or would have, if it were needed (if it were needed, he would lie to accurately report his epistemic state - "there are no launches"). Or "he accurately non-reported the non-presence of nuclear missiles".

Comment by Q Home on A Case for AI Safety via Law · 2023-09-22T02:31:46.640Z · LW · GW

Maybe you should edit the post to add something like this:

My proposal is not about the hardest parts of the Alignment problem. My proposal is not trying to solve theoretical problems with Inner Alignment or Outer Alignment (Goodhart, loopholes). I'm just assuming those problems won't be relevant enough. Or humanity simply won't create anything AGI-like (see CAIS).

Instead of discussing the usual problems in Alignment theory, I merely argue X. X is not a universally accepted claim, here's evidence that it's not universally accepted: [write the evidence here].

...

By focusing on the external legal system, many key problems associated with alignment (as recited in the Summary of Argument) are addressed. One worth highlighting is 4.4, which suggests AISVL can assure alignment in perpetuity despite changes in values, environmental conditions, and technologies, i.e., a practical implementation of Yudkowsky's CEV.

I think the key problems are not "addressed", you just assume they won't exist. And laws are not a "practical implementation of CEV".

Comment by Q Home on A Case for AI Safety via Law · 2023-09-20T22:19:58.930Z · LW · GW

Maybe there's a misunderstanding. Premise (1) makes sure that your proposal is different from any other proposal. It's impossible to reject premise (1) without losing the proposal's meaning.

Premise (1) is possible to reject only if you're not solving Alignment but solving some other problem.

I'm arguing for open, external, effective legal systems as the key to AI alignment and safety. I see the implementation/instilling details as secondary. My usage refers to specifying rules/laws/ethics externally so they are available and usable by all intelligent systems.

If an AI can be Aligned externally, then it's already safe enough. It feels like...

  • You're not talking about solving Alignment, but talking about some different problem. And I'm not sure what that problem is.
  • For your proposal to work, the problem needs to be already solved. All the hard/interesting parts need to be already solved.
Comment by Q Home on A Case for AI Safety via Law · 2023-09-20T09:37:30.894Z · LW · GW

Perhaps the most important and (hopefully) actionable recommendation of the proposal is in the conclusion:

"For the future safety and wellbeing of all sentient systems, work should occur in earnest to improve legal processes and laws so they are more robust, fair, nimble, efficient, consistent, understandable, accepted, and complied with." (comment)

Sorry for sounding harsh. But to say something meaningful, I believe you have to argue two things:

  • Laws are distinct enough from human values (1), but following laws / caring about laws / reporting about predicted law violations prevents the violation of human values (2).

I think the post fails to argue both points. I see no argument that instilling laws is distinct enough from instilling values/corrigibility/human semantics in general (1) and that laws actually prevent misalignment (2).

Later I write, "Suggested improvements to law and legal process are mostly beyond the scope of this brief. It is possible, however, that significant technological advances will not be needed for implementing some key capabilities. For example, current Large Language Models are nearly capable of understanding vast legal corpora and making appropriate legal decisions for humans and AI systems (Katz et al., 2023). Thus, a wholesale switch to novel legal encodings (e.g., computational and smart contracts) may not be necessary."

If AI can be just asked to follow your clever idea, then AI is already safe enough without your clever idea. "Asking AI to follow something" is not what Bostrom means by direct specification, as far as I understand.

Comment by Q Home on Which Questions Are Anthropic Questions? · 2023-09-18T13:58:51.719Z · LW · GW

I like how you explain your opinion, very clear and short, basically contained in a single bit of information: "you're not a random sample" or "this equivalence between 2 classes of problems can be wrong".

But I think you should focus on describing the opinion of others (in simple/new ways) too. Otherwise you're just repeating yourself over and over.

If you're interested, I could try helping to write a simplified guide to ideas about anthropics.

Comment by Q Home on Some Thoughts on AI Art · 2023-09-18T13:54:05.360Z · LW · GW

Additionally, this view ignores art consumers, who out-number artists by several orders of magnitude. It seems unfair to orient so much of the discussion of AI art's effects on the smaller group of people who currently create art.

What is the greater framework behind this argument? "Creating art" is one of the most general potentials a human being can realize. With your argument we could justify chopping off every human potential because "there's a greater amount of people who don't care about realizing it".

I think deleting a key human potential (and a shared cultural context) affects the entire society.

Comment by Q Home on Open Thread – Autumn 2023 · 2023-09-15T09:31:24.164Z · LW · GW

A stupid question about anthropics and [logical] decision theories. Could we "disprove" some types of anthropic reasoning based on [logical] consistency? I struggle with math, so please keep the replies relatively simple.

  • Imagine 100 versions of me, I'm one of them. We're all egoists, each one of us doesn't care about the others.
  • We're in isolated rooms, each room has a drink. 90 drinks are rewards, 10 drink are punishments. Everyone is given the choice to drink or not to drink.
  • The setup is iterated (with memory erasure), everyone gets the same type of drink each time. If you got the reward, you get the reward each time. Only you can't remember that.

If I reason myself into drinking (reasoning that I have a 90% chance of reward), from the outside it would look as if 10 egoists have agreed (very conveniently, to the benefit of others) to suffer again and again... is it a consistent possibility?

Comment by Q Home on Why am I Me? · 2023-09-08T05:32:21.213Z · LW · GW

Let's look at actual outcomes here. If every human says yes, 95% of them get to the afterlife. If every human says no, 5% of them get to the afterlife. So it seems better to say yes in this case, unless you have access to more information about the world than is specified in this problem. But if you accept that it's better to say yes here, then you've basically accepted the doomsday argument.

There's a chance you're changing the nature of the situation by introducing Omega. Often "beliefs" and "betting strategy" go together, but here it may not be the case. You have to prove that the decision in the Omega game has any relation to any other decisions.

There's a chance this Omega game is only "an additional layer of tautology" which doesn't justify anything. We need to consider more games. I can suggest a couple of examples.

Game 1:

Omega: There are 2 worlds, one is much more populated than another. In the bigger one magic exists, in the smaller one it doesn't. Would you bet that magic exists in your world? Would you actually update your beliefs and keep that update?

One person can argue it becomes beneficial to "lie" about your beliefs/adopt temporal doublethink. Another person can argue for permanently changing your mind about magic.

Game 2:

Omega: I have this protocol. When you stand on top of a cliff, I give you a choice to jump or not. If you jump, you die. If you don't, I create many perfect simulations of this situation. If you jump in a simulation, you get a reward. Wanna jump?

You can argue "jumping means death, the reward is impossible to get". Unless you have access to true randomness which can vary across perfect copies of the situation. IDK. Maybe "making the Doomsday update beneficially" is impossible.

You did touch on exactly that, so I'm not sure how much my comment agrees with your opinions.

Comment by Q Home on H5N1. Just how bad is the situation? · 2023-07-09T11:40:46.478Z · LW · GW

The real question is will H5N1 pandemic happen in the next 5-10 years

2.4%

Sorry for a dumb question, but where do those numbers come from? What reasoning stands behind them? Is it some causal story ("jumping to humans is not that easy"), or priors ("pandemics are unlikely") or some precedent analysis ("it's not the first time a virus infects so much animal types")?

I really lack knowledge about viruses.

Comment by Q Home on H5N1. Just how bad is the situation? · 2023-07-09T02:00:07.217Z · LW · GW

What exactly, in our rational consideration, keeps the risk relatively low? Is it a prior that calamity-level pandemics happen rarely? Is it the fact (?) that today's situation is not that unique? Is it the hope that the virus can "back down", somehow? Is it some fact about general behavior of viruses?

What are the "cruxes" of "the risk is relatively low" prediction, what events would increase/decrease the risk and how much? For example, what happens with the probability if a lot of mammal-to-mammal transmissions start happening? Maybe I've missed it, but Zvi doesn't seem to address such points. I feel utterly confused. As if I'm missing an obvious piece of context which "nobody is talking about".

I have little knowledge about viruses. How unique is it for a virus to be deadly (and already a deadly threat for humans), epizootic (epidemic in non-humans) and panzootic (affecting animals of many species, especially over a wide area)? (From wikipedia article.)

The most naive, over-reactive and highly likely misinformed take would be "we are in a unique situation in history (in terms of viruses), more unique than Spanish flu and Black Death, because the latter weren't (?) widespread among non-humans. there are some dice rolls which separate us from disaster, but all possible dice rolls are now happening daily for days and months (and years)." ... What makes all the factors cash out into "anyway, the risk is relatively low, just one digit"? Here's an analogy: from a naive outside perspective, H5N1's "progress" may seem as impressive as ChatGPT. "This never (?) happened, but suddenly it happened and from this point on things can only escalate (probably)" - I guess for an outsider it's easy to get an impression like this. I feel confused because I'm not seeing it directly addressed.

Comment by Q Home on Ideas of the Gaps · 2023-07-04T05:16:00.375Z · LW · GW

So they overlook the simpler patterns because they pay less rent upfront, even though they are more general and a better investment long-term.

...

And if you use this metaphor to imagine what's going to happen to a tiny drop of water on a plastic table, you could predict that it will form a ball and refuse to spread out. While the metaphor may only be able to generate very uncertain & imprecise predictions, it's also more general.

Can you expand on the this thought ("something can give less specific predictions, but be more general") or reference famous/professional people discussing it? This thought can be very trivial, but it also can be very controversial.

Right now I'm writing a post about "informal simplicity", "conceptual simplicity". It discusses simplicity of informal concepts (concepts not giving specific predictions). I make an argument that "informal simplicity" should be very important a priori. But I don't know if "informal simplicity" was used (at least implicitly) by professional and famous people. Here's as much as I know: (warning, controversial and potentially inaccurate takes!)

  • Zeno of Elea made arguments basically equivalent to "calculus should exist" and "theory of computation should exist" ("supertasks are a thing") using only the basic math.

  • The success of neural networks is a success of one of the simplest mechanisms: backpropagation and attention. (Even though they can be heavy on math too.) We observed a complicated phenomenon (real neurons), we simplified it... and BOOM!

  • Arguably, many breakthroughs in early and late science were sealed behind simple considerations (e.g. equivalence principle), not deduced from formal reasoning. Feynman diagram weren't deduced from some specific math, they came from the desire to simplify.

  • Some fields "simplify each other" in some way. Physics "simplifies" math (via physical intuitions). Computability theory "simplifies" math (by limiting it to things which can be done by series of steps). Rationality "simplifies" philosophy (by connecting it to practical concerns) and science.

  • To learn flying, Wright brothers had to analyze "simple" considerations.

  • Eliezer Yudkowsky influenced many people with very "simple" arguments. Rational community as a whole is a "simplified" approach to philosophy and science (to a degree).

  • The possibility of a logical decision theory can be deduced from simple informal considerations.

  • Albert Einstein used simple thought experiments.

  • Judging by the famous video interview, Richard Feynman likes to think about simple informal descriptions of physical processes. And maybe Feynman talked about "less precise, but more general" idea? Maybe he said that epicycles were more precise, but a heliocentric model was better anyway? I couldn't find it.

  • Terry Tao occasionally likes to simplify things. (e.g. P=NP and multiple choice exams, Quantum mechanics and Tomb Raider, Special relativity and Middle-Earth and Calculus as “special deals”). Is there more?

  • Some famous scientists weren't shying away from philosophy (e.g. Albert Einstein, Niels Bohr?, Erwin Schrödinger).

Please, share any thoughts or information relevant to this, if you have any! It's OK if you write your own speculations/frames.

Comment by Q Home on Three levels of exploration and intelligence · 2023-03-16T22:17:14.668Z · LW · GW

If you have a flexible enough representation then you can use it to represent anything, unfortunately you've also gutted it of predictive power (vs post hoc explanation).

I think this can be wrong:

  1. "Y" and "D" are not empty symbols, they come with an objective enough metric (the metric of "general importance"). So, it's like saying that "A" and "B" in the Bayes' theorem are empty symbols without predictive power. And I believe the analogy with Bayes' theorem is not accidental, by the way, because I think you could turn my idea into a probabilistic inference rule.
  2. If my method can't help to predict good ideas, it still can have predictive power if it evaluates good ideas correctly (before they get universally recognized as good). Not every important idea is immediately recognized as important.

Can you expand on the connection with Leverage Points? Seems like 12 Leverage Points is an extremely specific and complicated idea (doesn't mean it can't be good in its own field, though).

Comment by Q Home on Q Home's Shortform · 2023-03-08T22:26:45.614Z · LW · GW

*A more "formal" version of the draft (it's a work in progress): *

There are two interpretations of this post, weak and strong.

Weak interpretation:

I describe a framework about "thee levels of exploration". I use the framework to introduce some of my ideas. I hope that the framework will give more context to my ideas, making them more understandable. I simply want to find people who are interested in exploring ideas. Exploring just for the sake of exploring or for a specific goal.

Strong interpretation:

I use the framework as a model of intelligence. I claim that any property of intelligence boils down to the "three levels of exploration". Any talent, any skill. The model is supposed to be "self-evident" because of its simplicity, it's not based on direct analysis of famous smart people.

Take the strong interpretation with a lot of grains of salt, of course, because I'm not an established thinker and I haven't achieved anything intellectual. I just thought "hey, this is a funny little simple idea, what if all intelligence works like this?", that's all.

That said, I'll need to make a couple of extraordinary claims "from inside the framework" (i.e. assuming it's 100% correct and 100% useful). Just because that's in the spirit of the idea. Just because it allows to explore the idea to its logical conclusion. Definitely not because I'm a crazy man. You can treat the most outlandish claims as sci-fi ideas.

A formula of thinking?

Can you "reduce" thinking to a single formula? (Sounds like cringe and crackpottery!)

Can you show a single path of the best and fastest thinking?

Well, there's an entire class of ideas which attempt to do this in different fields, especially the first idea:

My idea is just another attempt at reduction. You don't have to treat such attempts 100% seriously in order to find value in them. You don't have to agree with them.

Three levels of exploration

Let's introduce my framework.

In any topic, there are three levels of exploration:

  1. You study a single X.
  2. You study types of different X. Often I call those types "qualities" of X.
  3. You study types of changes (D): in what ways different X change/get changed by a new thing Y. Y and D need to be important even outside of the (main) context of X.

The point is that at the 2nd level you study similarities between different X directly, but at the 3rd level you study similarities indirectly through new concepts Y and D. The letter "D" means "dynamics".

I claim that any property of intelligence can be boiled down to your "exploration level". Any talent, any skill and even more vague things such as "level of intentionality". I claim that the best and most likely ideas come from the 3rd level. That 3rd level defines the absolute limit of currently conceivable ideas. So, it also indirectly defines the limit of possible/conceivable properties of reality.

You don't need to trust those extraordinary claims. If the 3rd level simply sounds interesting enough to you and you're ready to explore it, that's good enough.

Three levels simplified

A vague description of the three levels:

  1. You study objects.
  2. You study qualities of objects.
  3. You study changes of objects.

Or:

  1. You study a particular thing.
  2. You study everything.
  3. You study abstract ways (D) in which the thing is changed by "everything".

Or:

  1. You study a particular thing.
  2. You study everything.
  3. You study everything through a particular thing.

So yeah, it's a Hegelian dialectic rip-off. Down below are examples of applying my framework to different topics. You don't need to read them all, of course.


Exploring debates

1. Argumentation

I think there are three levels of exploring arguments:

  1. You judge arguments as right or wrong. Smart or stupid.
  2. You study types of arguments. Without judgement.
  3. You study types of changes (D): how arguments change/get changed by some new thing Y. ("dynamics" of arguments)

If you want to get a real insight about argumentation, you need to study how (D) arguments change/get changed by some new thing Y. D and Y need to be important even outside of the context of explicit argumentation.

For example, Y can be "concepts". And D can be "connecting/separating" (a fundamental process which is important in a ton of contexts). You can study in what ways arguments connect and separate concepts.

A simplified political example: a capitalist can tend to separate concepts ("bad things are caused by mistakes and bad actors"), while a socialist can tend to connect concepts ("bad things are caused by systemic problems"). Conflict Vs. Mistake^(1) is just a very particular version of this dynamic. Different manipulations with concepts create different arguments and different points of view. You can study all such dynamics. You can trace arguments back to fundamental concept manipulations. It's such a basic idea and yet nobody has done it. Aristotle has done it 2400 years ago, but for formal logic.

^(1. I don't agree with Scott Alexander, by the way.)

Arguments: conclusion

I think most of us are at the level 1 in argumentation: we throw arguments at each other like angry cavemen without studying what an "argument" is and/or what dynamics it creates. If you completely unironically think that "stupid arguments" exist, then you're probably on the 1st level. Professional philosophers are at the level 2 at best, but usually lower (they are surprisingly judgemental). At least they are somewhat forced to be tolerant to the most diverse types of arguments due to their profession.

On what level are you? Have you studied arguments without judgement?

2. Understanding/empathy

I think there are three levels in understanding your opponent:

  1. You study a specific description (X) of your opponent's opinion. You can pass the Ideological Turing Test in a superficial way. Like a parrot.
  2. You study types of descriptions of your opponent's opinion. ("Qualities" of your opponent's opinion.) You can "inhabit" the emotions/mindset of your opponent.
  3. You study types of changes (D): how the description of your opponent's opinion changes/get changed by some new thing Y. D and Y need to be important even outside of debates.

For example, Y can be "copies of the same thing" and D can be "transformations of copies into each other". Such Y and D are important even outside of debates.

So, on the 3rd level you may be able to describe the opponent's position as a weaker version/copy of your own position (Y) and clearly imagine how your position could turn out to be "the weaker version/copy" of the opponent's views. You can imagine how opponent's opinion transforms into truth and your opinion transforms into a falsehood (D).

Other interesting choices of Y and D are possible. For example, Y can be "complexity of the opinion [in a given context]"; D can be "choice of the context" and "increasing/decreasing of complexity". You can run the opinion of your opponent through different contexts and see how much it reacts to/accommodates the complexity of the world.

Empathy: conclusion

I think people very rarely do the 3rd level of empathy.

Doing it systematically would lead to a new political/epistemological paradigm.


Exploring philosophy

1. Beliefs and ontology

I think there are three levels of studying the connection between beliefs and ontology:

  1. You think you can see the truth of a belief directly. For example, you can say "all beliefs which describe reality in a literal way are true". You get stuff like Naïve Realism. "Reality is real."
  2. You study types of beliefs. You can say that all beliefs of a certain type are true. For example, "all mathematical beliefs are true". You get stuff like Mathematical Universe Hypothesis, Platonisim, Ontic Structural Realism... "Some description of reality is real."
  3. You study types of changes (D): how beliefs change/get changed by some new thing Y. You get stuff like Berkeley’s subjective idealism and radical probabilism and Bayesian epistemology: the world of changing ideas. "Some changing description of reality is real."

What can D and Y be? Both things need to be important even outside of the context of explicit beliefs. A couple of versions:

  • Y can be "semantic connections". D can be "connecting/separating [semantic connections]". Both things are generally important, for example in linguistics, in studying semantic change. We get Berkeley's idealism.
  • Y can be "probability mass" or some abstract "weight". D can be "distribution of the mass/weight". We get probabilism/Bayesianism.

Thinking at the level of semantic connections should be natural to people, because they use natural language and... neural nets in their brains! (Berkeley makes a similar argument: "hey, folks, this is just common sense!") And yet this idea is extremely alien to people epistemology-wise and ontology-wise. I think the true potential of the 3rd level remains unexplored.

Beliefs: conclusion

I think most rationalists (Bayesians, LessWrong people) are "confused" between the 2nd level and the 1st level, even though they have some 3rd level tools.

Eliezer Yudkowsky is "confused" between the 1st level and the 3rd level: he likes level 1 ideas (e.g. "map is not the territory"), but has a bunch of level 3 ideas ("some maps are the territory") about math, probability, ethics, decision theory, Security Mindset...

2. Ontology and reality

I think there are three level of exploring the relationship between ontologies and reality:

  1. You think that an ontology describes the essence of reality.
  2. You study how different ontologies describe different aspects of reality.
  3. You study types of changes (D): how ontologies change/get changed by some other concept Y. D and Y need to be important even outside of the topic of (pure) ontology.

Y can be "human minds" or simply "objects". D can be "matching/not matching" or "creating a structure" (two very basic, but generally important processes). You get Kant's "Copernican revolution" (reality needs to match your basic ontology, otherwise information won't reach your mind: but there are different types of "matching" and transcendental idealism defines one of the most complicated ones) and Ontic Structural Realism (ontology is not about things, it's about structures created by things) respectively.

On what level are you? Have you studied ontologies/epistemologies without judgement? What are the most interesting ontologies/epistemologies you can think of?

3. Philosophy overall

I think there are three levels of doing philosophy in general:

  1. You try to directly prove an idea in philosophy using specific philosophical tools.
  2. You study types of philosophical ideas.
  3. You study types of changes (D): how philosophical ideas change/get changed by some other thing Y. D and Y need to be important even outside of (pure) philosophy.

To give a bunch of examples, Y can be:

I think people did a lot of 3rd level philosophy, but we haven't fully committed to the 3rd level yet. We are used to treating philosophy as a closed system, even when we make significant steps outside of that paradigm.


Exploring ethics

1. Commitment to values

I think there are three levels of values:

  1. Real values. You treat your values as particular objects in reality.

  2. Subjective values. You care only about things inside of your mind. For example, do you feel good or not?

  3. Semantic values. You care about types of changes (D): how your values change/get changed by reality (Y). Your value can be expressed as a combination of the three components: "a real thing + its meaning + changes".

Example of a semantic value: you care about your friendship with someone. You will try to preserve the friendship. But in a limited way: you're ready that one day the relationship may end naturally (your value may "die" a natural death). Semantic values are temporal and path-dependent. Semantic values are like games embedded in reality: you want to win the game without breaking the rules.

2. Ethics

I think there are three levels of analyzing ethics:

  1. You analyze norms of specific communities and desires of specific people. That's quite easy: you are just learning facts.
  2. You analyze types of norms and desires. You are lost in contradictory implications, interpretations and generalizations of people's values. You have a meta-ethical paralysis.
  3. You study types of changes (D): how norms and desires change/get changed by some other thing Y. D and Y need to be important even outside of (purely) ethical context.

Ethics: tasks and games

For example, Y can be "tasks, games, activities" and D can be "breaking/creating symmetries". You can study how norms and desires affect properties of particular activities.

Let's imagine an Artificial Intelligence or a genie who fulfills our requests (it's a "game" between us). We can analyze how bad actions of the genie can break important symmetries of the game. Let's say we asked it to make us a cup of coffee:

  • If it killed us after making the coffee, we can't continue the game. And we ended up with less than we had before. And we wouldn't make the request if we knew that's gonna happen. And the game can't be "reversed": the players are dead.

  • If it has taken us under mind control, we can't affect the game anymore (and it gained 100% control over the game). If it placed us into a delusion, then the state of the game can be arbitrarily affected (by dissolving the illusion). And depends on perspective.

  • If it made us addicted to coffee, we can't stop or change the game anymore. And the AI/genie drastically changed the nature of the game without our consent. It changed how the "coffee game" relates to all other games, skewed the "hierarchy of games".

Those are all "symmetry breaks". And such symmetry breaks are bad in most of the tasks.

Ethics: Categorical Imperative

With Categorical Imperative, Kant explored a different choice of Y and D. Now Y is "roles of people", "society" and "concepts"; D is "universalization" and "becoming incoherent/coherent" and other things.

Ethics: Preferences

If Y is "preferences" and D is "averaging", we get Preference utilitarianism. (Preferences are important even outside of ethics and "averaging" is important everywhere.) But this idea is too "low-level" to use in analysis of ethics.

However, if Y is "versions of an abstract preference" and D is "splitting a preference into versions" and "averaging", then we get a high-level analog of preference utilitarianism. For example, you can take an abstract value such as Bodily autonomy and try to analyze the entirety of human ethics as an average of versions (specifications) of this abstract value.

Preference utilitarianism reduces ethics to an average of micro-values, the idea above reduces ethics to an average of a macro-value.

Ethics: conclusion

So, what's the point of the 3rd level of analyzing ethics? The point is to find objective sub-structures in ethics where you can apply deduction to exclude the most "obviously awful" and "maximally controversial and irreversible" actions. The point is to "derive" ethics from much more broad topics, such as "meaningful games" and "meaningful tasks" and "coherence of concepts".

I think:

  • Moral philosophers and Alignment researches are ignoring the 3rd level. People are severely underestimating how much they know about ethics.
  • Acknowledging the 3rd level doesn't immediately solve Alignment, but it can "solve" ethics or the discourse around ethics. Empirically: just study properties of tasks and games and concepts!
  • Eliezer Yudkowsky has limited 3rd level understanding of meta-ethics ("Abstracted Idealized Dynamics", "Morality as Fixed Computation", "The Bedrock of Fairness") but misses that he could make his idea more broad.
  • Particularism (in ethics and reasoning in general) could lead to the 3rd level understanding of ethics.

Exploring perception

1. Properties

There are three levels of looking at properties of objects:

  1. Inherent properties. You treat objects as having more or less inherent properties. E.g. "this person is inherently smart"

  2. Meta-properties. You treat any property as universal. E.g. "anyone is smart under some definition of smartness"

  3. Semantic properties. You treat properties only as relatively attached to objects. You focus on types of changes (D): how properties and their interpretations change/get changed by some other thing Y. You "reduce" properties to D and Y. E.g. "anyone can be a genius or a fool under certain important conditions" or "everyone is smart, but in a unique and important way"

2. Commitment to experiences and knowledge

I think there are three levels of commitment to experiences:

  1. You're interested in particular experiences.

  2. You want to explore all possible experiences.

  3. You're interested in types of changes (D): how your experience changes/get changed by some other thing Y. D and Y need to be important even outside of experience.

So, on the 3rd level you care about interesting ways (D) in which experiences correspond to reality (Y).

3. Experience and morality

I think there are three levels of investigating the connection between experience and morality:

  1. You study how experience causes us to do good or bad things.
  2. You study all the different experiences "goodness" and "badness" causes in us.
  3. You study types of changes (D): how your experience changes/get changed by some other thing Y. D and Y need to be important even outside of experience. But related to morality anyway.

For example, Y can be "[basic] properties of concepts" and D can be "matches / mismatches [between concepts and actions towards them]". You can study how experience affects properties of concepts which in turn bias actions. An example of such analysis: "loving a sentient being feels fundamentally different from eating a sandwich. food taste is something short and intense, but love can be eternal and calm. this difference helps to not treat other sentient beings as something disposable"

I think the existence of the 3rd level isn't acknowledged much. Most versions of moral sentimentalism are 2nd level at best. Epistemic Sentimentalism can be 3rd level in the best case.


Exploring cognition

1. Patterns

I think there are three levels of [studying] patterns:

  1. You study particular patterns (X). You treat patterns as objective configurations in reality.
  2. You study all possible patterns. You treat patterns as subjective qualities of information, because most patterns are fake.
  3. You study types of changes (D): how patterns change/get changed by some other thing Y. D and Y need to be important even outside of (explicit) pattern analysis. You treat a pattern as a combination of the three components: "X + Y + D".

For example, Y can be "pieces of information" or "contexts": you can study how patterns get discarded or redefined (D) when new information gets revealed/new contexts get considered.

You can study patterns which are "objective", but exist only in a limited context. For example, think about your friend's bright personality (personality = a pattern). It's an "objective" pattern, and yet it exists only in a limited context: the pattern would dissolve if you compared your friend to all possible people. Or if you saw your friend in all possible situations they could end up in. Your friend's personality has some basis in reality (X), has a limited domain of existence (Y) and the potential for change (D).

2. Patterns and causality

I think there are three levels in the relationship between patterns and causality. I'm going to give examples about visual patterns:

  1. You learn which patterns are impossible due to local causal processes. For example: "I'm unlikely to see a big tower made of eggs standing on top of each other". It's just not a stable situation due to very familiar laws of physics.

  2. You learn statistical patterns (correlations) which can have almost nothing to do with causality. For example: "people like to wear grey shirts".

  3. You learn types of changes (D): how patterns change/get changed by some other thing Y. D and Y need to be important even outside of (explicit) pattern analysis. And related to causality.

Y can be "basic properties of images" and "basic properties of patterns"; D can be "sharing properties" and "keeping the complexity the same". In simpler words:

On the 3rd level you learn patterns which have strong connections to other patterns and basic properties of images. You could say such patterns are created/prevented by "global" causal processes. For example: "I'm unlikely to see a place fully filled with dogs. dogs are not people or birds or insects, they don't create such crowds or hordes". This is very abstract, connects to other patterns and basic properties of images.

Causality: implications for Machine Learning

I think...

  • It's likely that Machine Learning models don't learn 3rd level patterns as well as they could, as sharply as they could.
  • Machine Learning models should be 100% able to learn 3rd level patterns. It shouldn't require any specific data.
  • Learning/comparing level 3 patterns is interesting enough on its own. It could be its own area of research. But we don't apply statistics/Machine Learning to try mining those patterns. This may be a missed opportunity for humans.

3. Cognitive processes

Suppose you want to study different cognitive processes, skills, types of knowledge. There are three levels:

  1. You study particular cognitive processes.

  2. You study types (qualities) of cognitive processes. And types of types (classifications).

  3. You study types of changes (D): how cognitive processes change/get changed by some other thing Y. D and Y need to be important even without the context of cognitive processes.

For example, Y can be "fundamental configurations / fundamental objects" and D can be "finding a fundamental configuration/object in a given domain". You can "reduce" different cognitive process to those Y and D: (names of the processes below shouldn't be taken 100% literally)

^(1 "fundamental" means "VERY widespread in a certain domain")

  • Causal reasoning learns fundamental configurations of fundamental objects in the real world. So you can learn stuff like "this abstract rule applies to most objects in the world".
  • Symbolic reasoning learns fundamental configurations of fundamental objects in your "concept space". So you can learn stuff like ""concept A containing concept B" is an important pattern" (see set relations).
  • Correlational reasoning learns specific configurations of specific objects.
  • Mathematical reasoning learns specific configurations of fundamental objects. So you can build arbitrary structures with abstract building blocks.
  • Self-aware reasoning can transform fundamental objects into specific objects. So you can think thoughts like, for example, "maybe I'm just a random person with random opinions" (you consider your perspective as non-fundamental) or "maybe the reality is not what it seems".

I know, this looks "funny", but I think all this could be easily enough formalized. Isn't that a natural way to study types of reasoning? Just ask what knowledge a certain type of reasoning learns!


Exploring theories

1. Science

I think there are three ways of doing science:

  1. You predict a specific phenomenon.

  2. You study types of phenomena. (qualities of phenomena)

  3. You study types of changes (D): how the phenomenon changes/get changed by some other thing Y. D and Y need to be important even outside of this phenomenon.

Imagine you want to explain combustion (why/how things burn):

  1. You try to predict combustion. This doesn't work, because you already know "everything" about burning and there are many possible theories. You end up making things up because there's not enough new data.
  2. You try to compare combustion to other phenomena. You end up fantasizing about imaginary qualities of the phenomenon. At this level you get something like theories of "classical elements" (fantasies about superficial similarities).
  3. You find or postulate a new thing (Y) which affects/gets affected (D) by combustion. Y and D need to be important in many other phenomena. If Y is "types of matter" and D is "releasing / absorbing", this gives you Phlogiston theory. If Y is "any matter" and D is "conservation of mass" and "any transformations of matter", you get Lavoisier's theory. If Y is "small pieces of matter (atoms)" and D is "atoms hitting each other", you get Kinetic theory of gases.

So, I think phlogiston theory was a step in the right direction, but it failed because the choice of Y and D wasn't abstract enough.

I think most significant scientific breakthroughs require level 3 ideas. Partially "by definition": if a breakthrough is not "level 3", then it means it's contained in a (very) specific part of reality.

2. Math

I think there are three ways of doing math:

  1. You explore specific mathematical structures.

  2. You explore types of mathematical structures. And types of types. And typologies. At this level you may get something like Category theory.

  3. You study types of changes (D): how equations change/get changed by some other thing Y. D and Y need to be important even outside of (explicit) math.

Mathematico-philosophical insights

Let's look at math through the lens of the 3rd level:

All concepts above are "3rd level". But we can classify them, creating new three levels of exploration (yes, this is recursion!). Let's do this. I think there are three levels of mathematico-philosophical concepts:

  1. Concepts that change the properties of things we count. (e.g. topology, fractals, graph theory)
  2. Concepts that change the meaning of counting. (e.g. probability, computation, utility, sets, group theory, Gödel's incompleteness theorems and Tarski's undefinability theorem)
  3. Concepts that change the essence of counting. (e.g. Calculus, vectors, probability, actual infinity, fractal dimensions)

So, Calculus is really "the king of kings" and "the insight of insights". 3rd level of the 3rd level.

3. Physico-philosophical insights

I would classify physico-philosophical concepts as follows:

  1. Concepts that change the way movement affects itself. E.g. Net force, Wave mechanics, Huygens–Fresnel principle

  2. Concepts that change the "meaning" of movement. E.g. the idea of reference frames (principles of relativity), curved spacetime (General Relativity), the idea of "physical fields" (classical electromagnetism), conservation laws and symmetries, predictability of physical systems.

  3. Concepts that change the "essence" of movement, the way movement relates to basic logical categories. E.g. properties of physical laws and theories (Complementarity; AdS/CFT correspondence), the beginning/existence of movement (cosmogony, "why is there something rather than nothing?", Mathematical universe hypothesis), the relationship between movement and infinity (Supertasks) and computation/complexity, the way "possibility" spreads/gets created (Quantum mechanics, Anthropic principle), the way "relativity" gets created (Mach's principle), the absolute mismatch between perception and the true nature of reality (General Relativity, Quantum Mechanics), the nature of qualia and consciousness (Hard problem of consciousness), the possibility of Theory of everything and the question "how far can you take [ontological] reductionism?", the nature of causality and determinism, the existence of space and time and matter and their most basic properties, interpretation of physical theories (interpretations of quantum mechanics).


Exploring meta ideas

To define "meta ideas" we need to think about many pairs of "Y, D" simultaneously. This is the most speculative part of the post. Remember, you can treat those speculations simply as sci-fi ideas.

Each pair of abstract concepts (Y, D) defines a "language" for describing reality. And there's a meta-language which connects all those languages. Or rather there's many meta-languages. Each meta-language can be described by a pair of abstract concepts too (Y, D).

^(Instead of "languages" I could use the word "models". But I wanted to highlight that those "models" don't have to be formal in any way.)

I think the idea of "meta-languages" can be used to analyze:

  • Consciousness. You can say that consciousness is "made of" multiple abstract interacting languages. On one hand it's just a trivial description of consciousness, on another hand it might have deeper implications.
  • Qualia. You can say that qualia is "made of" multiple abstract interacting languages. On one hand this is a trivial idea ("qualia is the sum of your associations"), on another hand this formulation adds important specific details.
  • The ontology of reality. You can argue that our ways to describe reality ("physical things" vs. purely mathematical concepts, subjective experience vs. physical world, high-level patterns vs. complete reductionism, physical theory vs. philosophical ontology) all conflict with each other and lead to paradoxes when taken to the extreme, but can't exist without each other. Maybe they are all intertwined?
  • Meta-ethics. You can argue that concepts like "goodness" and "justice" can't be reduced to any single type of definition. So, you can try to reduce them to a synthesis of many abstract languages. See G. E. Moore ideas about indefinability: the naturalistic fallacy, the open-question argument.

According to the framework, ideas about "meta-languages" define the limit of conceivable ideas.

If you think about it, it's actually a quite trivial statement: "meta-models" (consisting of many normal models) is the limit of conceivable models. Your entire conscious mind is such "meta-model". If no model works for describing something, then a "meta-model" is your last resort. On one hand "meta-models" is a very trivial idea^(1), on another hand nobody ever cared to explore the full potential of the idea.

^(1 for example, we have a "meta-model" of physics: a combination of two wrong theories, General Relativity and Quantum Mechanics.)

Nature of percepts

I talked about qualia in general. Now I just want to throw out my idea about the nature of particular percepts.

There are theories and concepts which link percepts to "possible actions" and "intentions": see Affordance. I like such ideas, because I like to think about types of actions.

So I have a variation of this idea: I think that any percept gets created by an abstract dynamic (Y, D) or many abstract dynamics. Any (important) percept corresponds to a unique dynamic. I think abstract dynamics bind concepts.

^(But I have only started to think about this. I share it anyway because I think it follows from all the other ideas.)

P.S.

Thank you for reading this.

If you want to discuss the idea, please focus on the idea itself and its particular applications. Or on exploring particular topics!

Comment by Q Home on Q Home's Shortform · 2023-02-23T10:52:20.784Z · LW · GW

(draft of a future post)

I want to share my model of intelligence and research. You won't agree with it at the first glance. Or at the third glance. (My hope is that you will just give up and agree at the 20th glance.)

But that's supposed to be good: it means the model is original and brave enough to make risky statements.

In this model any difference in "intelligence levels" or any difference between two minds in general boils down to "commitment level".

What is "commitment"?

On some level, "commitment" is just a word. It's not needed to define the ideas I'm going to talk about. What's much more important is the three levels of commitment. There are often three levels which follow the same pattern, the same outline:

Level 1. You explore a single possibility.

Level 2. You want to explore all possibilities. But you are paralyzed by the amount of possibilities. At this level you are interested in qualities of possibilities. You classify possibilities and types of possibilities.

Level 3. You explore all possibilities through a single possibility. At this level you are interested in dynamics of moving through the possibility space. You classify implications of possibilities.

...

I'm going to give specific examples of the pattern above. This post is kind of repetitive, but it wasn't AI-generated, I swear. Repetition is a part of commitment.

Why is commitment important?

My explanation won't be clear before you read the post, but here it goes:

  • Commitment describes your values and the "level" of your intentionality.

  • Commitment describes your level of intelligence (in a particular topic). Compared to yourself (your potential) or other people.

  • Commitments are needed for communication. Without shared commitments it's impossible for two people to find a common ground.

  • Commitment describes the "true content" of an argument, an idea, a philosophy. Ultimately, any property of a mind boils down to "commitments".


Basics

1. Commitment to exploration

I think there are three levels of commitment to exploration.

Level 1. You treat things as immediate means to an end.

Imagine two enemy caveman teleported into a laboratory. They try to use whatever they find to beat each other. Without studying/exploring what they're using. So, they are just throwing microscopes and beakers at each other. They throw anti-matter guns at each other without even activating them.

Level 2. You explore things for the sake of it.

Think about mathematicians. They can explore math without any goal.

Level 3. You use particular goals to guide your exploration of things. Even though you would care about exploring them without any goal anyway. The exploration space is just too large, so you use particular goals to narrow it down.

Imagine a physicist who explores mathematics by considering imaginary universes and applying physical intuition to discover deep mathematical facts. Such person uses a particular goal/bias to guide "pure exploration". (inspired by Edward Witten, see Michael Atiyah's quote)

More examples

  • In terms of exploring ideas, our culture is at the level 1 (angry caveman). We understand ideas only as "ideas of getting something (immediately)" or "ideas of proving something (immediately)". We are not interested in exploring ideas for the sake of it. The only metrics we apply to ideas are "(immediate) usefulness" and "trueness". Not "beauty", "originality" and "importance". People in general are at the level 1. Philosophers are at the level 1 or "1.5". Rationality community is at the level 1 too (sadly): rationalists still mostly care only about immediate usefulness and truth.

  • In terms of exploring argumentation and reasoning, our culture is at the level 1. If you never thought "stupid arguments don't exist", then you are at the level 1: you haven't explored arguments and reasoning for the sake of it, you immediately jumped to assuming "The Only True Way To Reason" (be it your intuition, scientific method, particular ideology or Bayesian epistemology). You haven't stepped outside of your perspective a single time. Almost everyone is at the level 1. Eliezer Yudkowsky is at the level 3, but in a much narrower field: Yudkowsky explored rationality with the specific goal/bias of AI safety. However, overall Eliezer is at level 1 too: never studied human reasoning outside of what he thinks is "correct".

I think this is kind of bad. We are at the level 1 in the main departments of human intelligence and human culture. Two levels below our true potential.

2. Commitment to goals

I think there are three levels of commitment to goals.

Level 1. You have a specific selfish goal.

"I want to get a lot of money" or "I want to save my friends" or "I want to make a ton of paperclips", for example.

Level 2. You have an abstract goal. But this goal doesn't imply much interaction with the real world.

"I want to maximize everyone's happiness" or "I want to prevent (X) disaster", for example. This is a broad goal, but it doesn't imply actually learning and caring about anyone's desires (until the very end). Rationalists are at this level of commitment.

Level 3. You use particular goals to guide your abstract goals.

Some political activists are at this level of commitment. (But please, don't bring CW topics here!)

3. Commitment to updating

"Commitment to updating" is the ability to re-start your exploration from the square one. I think there are three levels to it.

Level 1. No updating. You never change ideas.

You just keep piling up your ideas into a single paradigm your entire life. You turn beautiful ideas into ugly ones so they fit with all your previous ideas.

Level 2. Updating. You change ideas.

When you encounter a new beautiful idea, you are ready to reformulate your previous knowledge around the new idea.

Level 3. Updating with "check points". You change ideas, but you use old ideas to prime new ones.

When you explore an idea, you mark some "check points" which you reached with that idea. When you ditch the idea for a new one, you still keep in mind the check points you marked. And use them to explore the new idea faster.


Science

4.1 Commitment and theory-building

I think there are three levels of commitment in theory-building.

Level 1.

You build your theory using only "almost facts". I.e. you come up with "trivial" theories which are almost indistinguishable from the things we already know.

Level 2.

You build your theory on speculations. You "fantasize" important properties of your idea (which are important only to you or your field).

Level 3.

You build your theory on speculations. But those speculations are important even outside of your field.

I think Eliezer Yudkowsky and LW did theory-building of the 3rd level. A bunch of LW ideas are philosophically important even if you disagree with Bayesian epistemology (Eliezer's view on ethics and math, logical decision theories and some Alignment concepts).

4.2 Commitment to explaining a phenomenon

I think there are three types of commitment in explaining a phenomenon.

Level 1.

You just want to predict the phenomenon. But many-many possible theories can predict the phenomenon, so you need something more.

Level 2.

You compare the phenomenon to other phenomena and focus on its qualities.

That's where most of theories go wrong: people become obsessed with their own fantasies about qualities of a phenomenon.

Level 3.

You focus on dynamics which connect this phenomenon to other phenomena. You focus on overlapping implications of different phenomena. 3rd level is needed for any important scientific breakthrough. For example:

Imagine you want to explain combustion (why/how things burn). On one hand you already "know everything" about the phenomenon, so what do you even do? Level 1 doesn't work. So, you try to think about qualities of burning, types of transformations, types of movement... but that won't take you anywhere. Level 2 doesn't work too. The right answer: you need to think not about qualities of transformations and movements, but about dynamics (conservation of mass, kinetic theory of gases) which connect different types of transformations and movements. Level 3 works.


Epistemology pt. 1

5. Commitment and epistemology

I think there are three levels of commitment in epistemology.

Level 1. You assume the primary reality of the physical world. (Physicism)

Take statements "2 + 2 = 4" and "God exists". To judge those statements, a physicist is going to ask "Do those statements describe reality in a literal way? If yes, they are true."

Level 2. You assume the primary reality of statements of some fundamental language. (Descriptivism)

To judge statements, a descriptivist is going to ask "Can those statements be expressed in the fundamental language? If yes, they are true."

Level 3. You assume the primary reality of semantic connections between statements of languages. And the primary reality of some black boxes which create those connections. (Connectivism) You assume that something physical shapes the "language reality".

To judge statements, a connectivist is going to ask "Do those statements describe an important semantic connection? If yes, they are true."

...

Recap. Physicist: everything "physical" exists. Descriptivist: everything describable exists. Connectivist: everything important exists. Physicist can be too specific and descriptivist can be too generous. (This pattern of being "too specific" or "too generous" repeats for all commitment types.)

Thinking at the level of semantic connections should be natural to people (because they use natural language and... neural nets in their brains!). And yet this idea is extremely alien to people epistemology-wise.

Implications for rationality

In general, rationalists are "confused" between level 1 and level 2. I.e. they often treat level 2 very seriously, but aren't fully committed to it.

Eliezer Yudkowsky is "confused" between level 1 and level 3. I.e. Eliezer has a lot of "level 3 ideas", but doesn't apply level 3 thinking to epistemology in general.

  • On one hand, Eliezer believes that "map is not the territory". (level 1 idea)
  • On another hand, Eliezer believes that math is an "objective" language shaped by the physical reality. (level 3 idea)
  • Similarly, Eliezer believes that human ethics are defined by some important "objective" semantic connections (which can evolve, but only to a degree). (level 3)
  • "Logical decision theories" treat logic as something created by connections between black boxes. (level 3)
  • When you do Security Mindset, you should make not only "correct", but beautiful maps. Societal properties of your map matter more than your opinions. (level 3)

So, Eliezer has a bunch of ideas which can be interpreted as "some maps ARE the territory".

6. Commitment and uncertainty

I think there are three levels of commitment in doubting one's own reasoning.

Level 1.

You're uncertain about superficial "correctness" of your reasoning. You worry if you missed a particular counter argument. Example: "I think humans are dumb. But maybe I missed a smart human or applied a wrong test?"

Level 2.

You un-systematically doubt your assumptions and definitions. Maybe even your inference rules a little bit (see "inference objection"). Example: "I think humans are dumb. But what is a "human"? What is "dumb"? What is "is"? And how can I be sure in anything at all?"

Level 3.

You doubt the semantic connections (e.g. inference rules) in your reasoning. You consider particular dynamics created by your definitions and assumptions. "My definitions and assumptions create this dynamic (not presented in all people). Can this dynamic exploit me?"

Example: "I think humans are dumb. But can my definition of "intelligence" exploit me? Can my pessimism exploit me? Can this be an inconvenient way to think about the world? Can my opinion turn me into a fool even I'm de facto correct?"

...

Level 3 is like "security mindset" applied to your own reasoning. LW rationality mostly teaches against it, suggesting you to always take your smallest opinions at face value as "the truest thing you know". With some exceptions, such as "ethical injunctions", "radical honesty", "black swan bets" and "security mindset".


Epistemology pt. 2

7. Commitment to understanding/empathy

I think there are three levels of commitment in understanding your opponent.

Level 1.

You can pass the Ideological Turing Test in a superficial way (you understand the structure of the opponent's opinion).

Level 2. "Telepathy".

You can "inhabit" the emotions/mindset of your opponent.

Level 3.

You can describe the opponent's position as a weaker version/copy of your own position. And additionally you can clearly imagine how your position could turn out to be "the weaker version/copy" of the opponent's position. You find a balance between telepathy and "my opinion is the only one which makes sense!"

8. Commitment to "resolving" problems

I think there are three levels of commitment in "resolving" problems.

Level 1.

You treat a problem as a puzzle to be solved by Your Favorite True Epistemology.

Level 2.

You treat a problem as a multi-layered puzzle which should be solved on different levels.

Level 3.

You don't treat a problem as a self-contained puzzle. You treat it as a "symbol" in the multitude of important languages. You can solve it by changing its meaning (by changing/exploring the languages).

Applying this type of thinking to the Unexpected hanging paradox:

I don't treat this paradox as a chess puzzle: I don't think it's something that could be solved or even "made sense of" from the inside. You need outside context. Like, does it ask you to survive? Then you can simply expect the hanging every day and be safe. (Though - can you do this to your psychology?) Or does the paradox ask you to come up with formal reasoning rules to solve it? But you can make any absurd reasoning system - to make a meaningful system you need to answer "for what purposes this system is going to be needed except this paradox". So, I think that "from the inside" there's no ground truth (though it can exist "from the outside"). Without context there's a lot of simple, but absurd or trivial solutions like "ignore logic, think directly about outcomes" or "come up with some BS reasoning system". Or say "Solomonoff induction solves all paradoxes: even if it doesn't, it's the best possible predictor of reality, so just ignore philosophers, lol".


Alignment pt. 1

9.1 Commitment to morality

I think there are three levels of commitment in morality.

Level 1. Norms, desires.

You analyze norms of specific communities and desires of specific people. That's quite easy: you are just learning facts.

Level 2. Ethics and meta-ethics.

You analyze similarities between different norms and desires. You get to pretty abstract and complicated values such as "having agency, autonomy, freedom; having an interesting life; having an ability to form connections with other people". You are lost in contradictory implications, interpretations and generalizations of those values. You have a (meta-)ethical paralysis.

Level 3. "Abstract norms".

You analyze similarities between implications of different norms and desires. You analyze dynamics created by specific norms. You realize that the most complicated values are easily derivable from the implications of the simplest norms. (Not without some bias, of course, but still.)

I think moral philosophers and Alignment researches are seriously dropping the ball by ignoring the 3rd level. Acknowledging the 3rd level doesn't immediately solve Alignment, but it can pretty much "solve" ethics (with a bit of effort).

9.2 Commitment to values

I think there are three levels of values.

Level 1. Inside values ("feeling good").

You care only about things inside of your mind. For example, do you feel good or not?

Level 2. Real values.

You care about things in the real world. Even though you can't care about them directly. But you make decisions to not delude yourself and not "simulate" your values.

Level 3. Semantic values.

You care about elements of some real system. And you care about proper dynamics of this system. For example, you care about things your friend cares about. But it's also important to you that your friend is not brainwashed, not controlled by you. And you are ready that one day your friend may stop caring about anything. (Your value may "die" a natural death.)

3rd level is the level of "semantic values". They are not "terminal values" in the usual sense. They can be temporal and history-dependent.

9.3 Commitment and research interest

So, you're interested in ways in which an AI can go wrong. What specifically can you be interested in? I think there are three levels to it.

Level 1. In what ways some AI actions are bad?

You classify AI bugs into types. For example, you find "reward hacking" type of bugs.

Level 2. What qualities of AIs are good/bad?

You classify types of bugs into "qualities". You find such potentially bad qualities as "AI doesn't care about the real world" and "AI doesn't allow to fix itself (corrigibility)".

Level 3. What bad dynamics are created by bad actions of AI? What good dynamics are destroyed?

Assume AI turned humanity into paperclips. What's actually bad about that, beyond the very first obvious answer? What good dynamics did this action destroy? (Some answers: it destroyed the feedback loop, the connection between the task and its causal origin (humanity), the value of paperclips relative to other values, the "economical" value of paperclips, the ability of paperclips to change their value.)

On the 3rd level you classify different dynamics. I think people completely ignore the 3rd level. In both Alignment and moral philosophy. 3rd level is the level of "semantic values".


Alignment pt. 2

10. Commitment to Security Mindset

I think Security Mindset has three levels of commitment.

Level 1. Ordinary paranoia.

You have great imagination, you can imagine very creative attacks on your system. You patch those angles of attack.

Level 2. Security Mindset.

You study your own reasoning about safety of the system. You check if your assumptions are right or wrong. Then, you try to delete as much assumptions as you can. Even if they seem correct to you! You also delete anomalies of the system even if they seem harmless. You try to simplify your reasoning about the system seemingly "for the sake of it".

Level 3.

You design a system which would be safe even in a world with changing laws of physics and mathematics. Using some bias, of course (otherwise it's impossible).

Humans, idealized humans are "level 3 safe". All/almost all current approaches to Alignment don't give you a "level 3 safe" AI.

11. Commitment to Alignment

I think there are three levels of commitment a (mis)aligned AI can have. Alternatively, those are three or two levels at which you can try to solve the Alignment problem.

Level 1.

AI has a fixed goal or a fixed method of finding a goal (which likely can't be Aligned with humanity). It respects only its own agency. So, ultimately it does everything it wants.

Level 2.

AI knows that different ethics are possible and is completely uncertain about ethics. AI respects only other people's agency. So, it doesn't do anything at all (except preventing, a bit lazily, 100% certain destruction and oppression). Or requires an infinite permission:

  1. Am I allowed to calculate "2 + 2"?
  2. Am I allowed to calculate "2 + 2" even if it leads to a slight change of the world?
  3. Am I allowed to calculate "2 + 2" even if it leads to a slight change of the world which you can't fully comprehend even if I explain it to you?
  4. ...
  5. Wait, am I allowed to ask those question? I'm already manipulating you by boring you to death. I can't even say anything.

Level 3.

AI can respect both its own agency and the agency of humanity. AI finds a way to treat its agency as the continuation of the agency of people. AI makes sure it doesn't create any dynamic which couldn't be reversed by people (unless there's nothing else to do). So, AI can both act and be sensitive to people.

Implications for Alignment research

I think a fully safe system exists only on the level 3. The most safe system is the system which understands what "exploitation" means, so it never willingly exploits its rewards in any way. Humans are an example of such system.

I think alignment researchers are "confused" between level 1 and level 3. They try to fix different "exploitation methods" (ways AI could exploit its rewards) instead of making the AI understand what "exploitation" means.

I also think this is the reason why alignment researches don't cooperate much, pushing in different directions.


Perception

11. Commitment to properties

Commitments exist even on the level of perception. There are three levels of properties to which your perception can react.

Level 1. Inherent properties.

You treat objects as having more or less inherent properties. "This person is inherently smart."

Level 2. Meta-properties.

You treat any property as universal. "Anyone is smart under some definition of smartness."

Level 3. Semantic properties.

You treat properties only as relatively attached to objects: different objects form a system (a "language") where properties get distributed between them and differentiated. "Everyone is smart, but in a unique way. And those unique ways are important in the system."

12.1 Commitment to experiences and knowledge

I think there are three levels of commitment to experiences.

Level 1.

You're interested in particular experiences.

Level 2.

You want to explore all possible experiences.

Level 3.

You're interested in real objects which produce your experiences (e.g. your friends): you're interested what knowledge "all possible experiences" could reveal about them. You want to know where physical/mathematical facts and experiences overlap.

12.2 Commitment to experience and morality

I think there are three levels of investigating the connection between experience and morality.

Level 1.

You study how experience causes us to do good or bad things.

Level 2.

You study all the different experiences "goodness" and "badness" causes in us.

Level 3.

You study dynamics created by experiences, which are related to morality. You study implications of experiences. For example: "loving a sentient being feels fundamentally different from eating a sandwich. food taste is something short and intense, but love can be eternal and calm. this difference helps to not treat other sentient beings as something disposable"

I think the existence of the 3rd level isn't acknowledged much. And yet it could be important for alignment. Most versions of moral sentimentalism are 2nd level at best. Epistemic Sentimentalism can be 3rd level.



Final part

Specific commitments

You can ponder your commitment to specific things.

Are you committed to information?

Imagine you could learn anything (and forget it if you want). Would you be interested in learning different stuff more or less equally? You could learn something important (e.g. the most useful or the most abstract math), but you also could learn something completely useless - such as the life story of every ant who ever lived.

I know, this question is hard to make sense of: of course, anyone would like to learn everything/almost everything if there was no downside to it. But if you have a positive/negative commitment about the topic, then my question should make some sense anyway.

Are you committed to people?

Imagine you got extra two years to just talk to people. To usual people on the street or usual people on the Internet.

Would you be bored hanging out with them?

My answers: >!Maybe I was committed to information in general as a kid. Then I became committed to information related to people, produced by people, known by people.!<

My inspiration for writing this post

I encountered a bunch of people who are more committed to exploring ideas (and taking ideas seriously) than usual. More committed than most rationalists, for example.

But I felt those people lack something:

  • They are able to explore ideas, but don't care about that anymore. They care only about their own clusters of idiosyncratic ideas.

  • They have very vague goals which are compatible with any specific actions.

  • They don't care if their ideas could even in principle matter to people. They have "disconnected" from other people, from other people's context (through some level of elitism).

  • When they acknowledge you as "one of them", they don't try to learn your ideas or share their ideas or argue with you or ask your help for solving a problem.

So, their commitment remains very low. And they are not "committed" to talking.

Conclusion

If you have a high level of commitment (3rd level) at least to something, then we should find a common language. You may even be like a sibling to me.

Thank you for reading this post. 🗿

Cognition

14.1 Studying patterns

I think there are three levels of commitment to patterns.

  1. You study particular patterns.
  2. You study all possible patterns: you study qualities of patterns.
  3. You study implications of patterns. You study dynamics of patterns: how patterns get updated or destroyed when you learn new information.

14.2 Patterns and causality

I think there are three levels in the relationship between patterns and causality. I'm going to give examples about visual patterns.

Level 1.

You learn which patterns are impossible due to local causal processes.

For example: "I'm unlikely to see a big tower made of eggs standing on top of each other". It's just not a stable situation due to very familiar laws of physics.

Level 2.

You learn statistical patterns (correlations) which can have almost nothing to do with causality.

For example: "people like to wear grey shirts".

Level 3.

You learn patterns which have a strong connection to other patterns and basic properties of images. You could say such patterns are created/prevented by "global" causal processes.

For example: "I'm unlikely to see a place fully filled with dogs. dogs are not people or birds or insects, they don't create such crowds". This is very abstract, connects to other patterns and basic properties of images.

Implications for Machine Learning

I think...

  • It's likely that Machine Learning models don't learn level 3 patterns as well as they could, as sharply as they could.

  • Machine Learning models should be 100% able to learn level 3 patterns. It shouldn't require any specific data.

  • Learning/comparing level 3 patterns is interesting enough on its own. It could be its own area of research. But we don't apply statistics/Machine Learning to try mining those patterns. This may be a missed opportunity for humans.

I think researchers are making a blunder by not asking "what kinds of patterns exist? what patterns can be learned in principle?" (not talking about universal approximation theorem)

15. Cognitive processes

Suppose you want to study different cognitive processes, skills, types of knowledge. There are three levels:

  1. You study particular cognitive processes.

  2. You study qualities of cognitive processes.

  3. You study dynamics created by cognitive processes. How "actions" of different cognitive processes overlap.

I think you can describe different cognitive processes in terms of patterns they learn. For example:

  • Causal reasoning learns abstract configurations of abstract objects in the real world. So you can learn stuff like "this abstract rule applies to most objects in the world".
  • Symbolic reasoning learns abstract configurations of abstract objects in your "concept space". So you can learn stuff like ""concept A contains concept B" is an important pattern".
  • Correlational reasoning learns specific configurations of specific objects.
  • Mathematical reasoning learns specific configurations of abstract objects. So you can build arbitrary structures with abstract building blocks.
  • Self-aware reasoning can transform abstract objects into specific objects. So you can think thoughts like, for example, "maybe I'm just a random person with random opinions".

I think all this could be easily enough formalized.


Meta-level

Can you be committed to exploring commitment?

I think yes.

One thing you can do is to split topics into sub-topics and raise your commitment in every particular sub-topic. Vaguely similar to gradient descent. That's what I've been doing in this post so far.

Another thing you can do is to apply recursion. You can split any topic into 3 levels of commitment. But then you can split the third level into 3 levels too. So, there's potentially an infinity of levels of commitment. And there can be many particular techniques for exploiting this fact.

But the main thing is the three levels of "exploring ways to explore commitment":

  1. You study particular ways to raise commitment.
  2. You study all possible ways to raise commitment.
  3. You study all possible ways through a particular way. You study dynamics and implications which the ways create.

I don't have enough information or experience for the 3rd level right now.

Comment by Q Home on Maybe you can learn exotic experiences via analytical thought · 2023-01-20T05:23:50.536Z · LW · GW

Thank you for sharing your perspective, even though most of it was too technical to understand for me.

I already believed something akin to panpsychism, but with a weighting function towards higher interaction processes of some specific kind I'm not sure I know how to mathematically identify, but which relates closely to integrated information, signal to noise ratio, compressibility, agency.

As I understand, this is your expectations about properties of qualia: some unknown process + integrated information + signal/noise, compressibility, agency.

I believe my post should give you new information (if the post is true) and new ideas.

Comment by Q Home on Maybe you can learn exotic experiences via analytical thought · 2023-01-20T04:20:58.947Z · LW · GW

Why do you think so? That's not a popular position. The post contains arguments for something similar, but I suppose you believe that for different reasons.

Comment by Q Home on Maybe you can learn exotic experiences via analytical thought · 2023-01-20T02:43:31.108Z · LW · GW

I agree, there are physical limits. But this doesn't really matter for the post.

The post claims there are a lot of crazy experiences which are within reach. Not even counting altered states of consciousness. And the post studies the question in general, explores what experiences are possible in principle (for any possible being). Not everything detected by a body gives a unique qualia. Or it's not obvious if this is true or false. (Which Mental States Possess Qualia?)

Comment by Q Home on Motivated Cognition and the Multiverse of Truth · 2022-12-16T10:49:45.417Z · LW · GW

How does instrumental rationality model informal argumentation/informal reasoning?

In the very general sense, anything is instrumental rationality to me if I believe that it works.

Comment by Q Home on Q Home's Shortform · 2022-12-11T10:54:39.433Z · LW · GW

"Everything is relative."

You know this phrase, right? But "relativity" is relative too. Maybe something is absolute.

But "relativity of relativity" is relative too. Maybe nothing is absolute after all... Those thoughts create an infinite tower of meta-levels.

If you think about the statement "truth = lie" ("you can go from T to F") you can get a similar tower. (Because it also implies "you can NOT go from T to F" and "you can go from "you can NOT go from T to F" to "you can go from T to F"" and so on.) It's not formal, but still interesting. Informally, the statement "truth = lie" is equivalent to "everything is relative".

Hierarchy of meta-levels is relative.

Imagine an idealist and a materialist. Materialist thinks "I'm meta compared to the idealist - I can analyze their thought process through physics". Idealist thinks "materialist thinks they're meta compared to me, but thinking in terms of physics is just one possible experience". So, "my thought process = the most important thing" and "my thought process + physics = the most important things" are both meta- compared to each other, they both can do meta-analysis of each other.

Both materialism and idealism can model each other. Materialism can be modeled by meta-idealism. Meta-idealism can be modeled by meta-materialism. Meta-materialism can be modeled by meta-meta-idealism. And so on. (Those don't have to be different models, it's just convenient to think about it in terms of levels.)

The same thing with altruism and selfishness. Altruism can be modeled by meta-selfishness. Meta-selfishness can be modeled by meta-meta-altruism. And you can abstract it to any property (A) and its negation (not A), because any property can be treated as a model of the world. So, this idea can be generalized as "A = not A".

Points and lines

Next step of the idea: for meta-level objects lower level objects are indistinguishable.

If you think in terms of points, two different points (A and B) are different objects to you. If you think in terms of lines, then points A and B may be parts of the same object. Or, on the other hand, the same point can be a part of completely different objects.

A universe of objects

Now imagine that some points are red and other points are blue. And we don't care about the shape of a line.

Level-1 lines contain only blue (positive) or only red (negative) points.

Level-2 lines can contain both types of points. E.g. they can contain mostly blue (complex positive) or mostly red (complex negative) points.

So, you can get different kinds of objects out of this, somewhat similar to numbers. I guess you can do this in many different ways. For example, you may have a spectrum of colors. Or you may have a positive and negative spectrums. To me it's very important, because it connects to my synesthesia: see here. The post is very unclear (don't advice reading it), but sadly I don't know how to explain everything better yet.

Comment by Q Home on Motivated Cognition and the Multiverse of Truth · 2022-12-02T08:19:35.351Z · LW · GW

Classic understanding of MC: "motivation = truth". My understanding of MC: "motivations + facts = truth". (See.)

Yes, my understanding of MC is "unusual". But I think it's fair:

  • Classic understanding always was flawed. Because it always covered just a small part of MC. (See.) You don't need my ideas to see it.
  • You can say my understanding is natural: it's what you get when you treat MC seriously or try to steelman it. And if we never even tried to steelman MC - that's on us.
  • People get criticized for all kinds of MC or "MC-looking" styles of thinking. E.g. for politics. Not only for witchcraft. My definition may be unusual, but it describes an already existing phenomenon.

So it's not that I cooked up a thing which never existed before in any way and slapped a familiar name on it.

Could you try tabooing MC?

Some ideas from the post: (most of them from here)

  • MC is a way to fill the gaps of informal arguments.
  • MC is a way to choose definitions of concepts. Decide what definitions are more important.
  • MC is a way to tie abstract labels to real things. Which is needed in order to apply formal logic or calculate probabilities. (See.)
  • MC is a way to "turn Bayesianism inside-out". (I don't know math, so I can't check it precisely.) You get MC if you try to model reality as a single fuzzy event. (See.) This fuzzy event becomes your "motivation" and you update it and its usage based on facts.
  • MC is a way to add additional parameters to "truth" and "simplicity" (when you can't estimate those directly).

We can explore those. I can analyze an argument in some of those terms (definitions, labels vs. real things and etc.).

Comment by Q Home on Motivated Cognition and the Multiverse of Truth · 2022-12-01T23:38:11.862Z · LW · GW

Also, you seem to have slid from "motivated cognition works to produce true beliefs/optimize the world" to the much weaker claim of "some people use motivated cognition, you need to understand it to predict there behavior". This is a big jump, and feels mote and bailey.

Most parts of the post are explicitly described as "this is how motivated cognition helps us, even if it's wrong". Stronger claims return later. And your weaker claim (about predicting people) is still strong and interesting enough.

No you don't. Penroses theory is totally abstract computability theory. If it were true, then so what? The best for humans facts are something like "alignment is easy, FAI built next week". This only works if penrose somehow got a total bee in his bonnet about uncomputability, it greatly offended his sensibilities that humans couldn't know everything. Even though we empirically don't. Even though pragmatic psycological bounds are a much tighter constraint than computability. In short, your theory of "motivated cognition" doesn't help predict much. Because you need to assume penroses motivations are just as wacky.

There I talk about the most interesting possibility in the context of physics and math, not Alingment. And I don't fully endorse Penrose's "motivation", even without Alingment his theory is not the most interesting/important thing to me. I treat Penrose's theory as a local maximum of optimism, not the global maximum. You're right. But this still helps to remember/highlight his opinions.

I'm not sure FAI is the global maximum of optimism too:

  • There may be things that are metaphysically more important. (Something about human intelligence and personality.)
  • We have to take facts into account too. And facts tell that MC doesn't help to avoid death and suffering by default. Maybe it could help if it were more widespread.

Those two factors make me think FAI wouldn't be guaranteed if we suddenly learned that "motivated cognition works (for the most part)".

Comment by Q Home on Motivated Cognition and the Multiverse of Truth · 2022-12-01T22:56:09.741Z · LW · GW

I notice that my explanation of MC failed somewhere (you're the second person to tell me). So, could you expand on that?

Or that there are too many layers of inference between us.

Maybe we just have different interests or "commitments" to those interests. For example:

  • I'm "a priori" interested in anything that combines motivations and facts. (Explained here why.)
  • I'm interested in high-level argumentation. I notice that Bayesianism doesn't model it much (or any high-level reasoning).
  • Bayesianism often criticizes MC and if MC were true it would be a big hit to Bayesianism. So, the topic of MC is naturally interesting.

If you're "committed" to those interests, you don't react like "I haven't understood this post about MC", you react more like "I have my own thoughts about MC. This post differs from them. Why?" or "I tried to think about MC myself, but I hit the wall. This post claims to make progress. I don't understand - how?" - i.e. because of the commitment you already have thoughts about the topic or interpret the topic through the lens "I should've thought about this myself".

My current understanding is that you're trying to say something like "optimism and imagination are good because they help you push through in interesting directions, rather than just stopping at the cold wall of logic".

Yes, this is one of the usages of optimism (for imagination). But we need commitments to some "philosophical" or conflicting topics to make this interesting. If you're not "a priori" interested in the topic of optimism, then you can just say "optimism for imagination? sure, but I can use anything else for imagination too". Any idea requires an anchor to an already existing topic or a conflict in order to be interesting. Without such anchors a summary of any idea is going to sound empty. (On the meta level, this is one of my arguments for MC: it allows you to perceive information with more conflict, with more anchors.)

Also, maybe your description excludes the possibility that MC could actually work for predictions. I think it's important to not exclude this possibility (even if we think it's false) in order to study MC in the most principled way.

Comment by Q Home on Motivated Cognition and the Multiverse of Truth · 2022-11-30T23:41:12.937Z · LW · GW

Do you think an analysis of more specific arguments, opinions and ideas from the perspective of "motivated cognition" would help? For example, I could try analyzing the most avid critics of LW (SneerClub) through the lens of motivated cognition. Or the argumentation in the Sequences.

My general reaction: I can interpret you as saying reasonable, interesting, and important things here, but your presentation is quite sloppy and make it hard to be convinced by your lines of reasoning since I'm often not sure you know your saying what I interpret you to be saying. (...) right now there's not enough specificity to really agree or disagree with you without interpolating a lot of details.

It may be useful to consider another frame except "agree/disagree": feeling/not feeling motivated to analyze MC in such depth and in such contexts. Like "I see this fellow (Q Home) analyzes MC in such and such contexts. Would I analyze MC in such context, in such depth? If not, why would I stop my analysis at some point?". And if the post inspired any important thoughts, feel free to write about them, even if it turns out that I didn't mean them.

Comment by Q Home on Truth seeking is motivated cognition · 2022-11-23T03:33:07.521Z · LW · GW

I wrote a post about motivated cognition in epistemology, a version of "the problem of the criterion" and (a bit) about different theories of truth. If you want, I would be happy to discuss some of it with you.

Comment by Q Home on Q Home's Shortform · 2022-10-26T07:37:35.918Z · LW · GW

The whole draft is here. (and the newer one is here)

Edit: my latest draft should be here.

Comment by Q Home on Moorean Statements · 2022-10-25T02:03:48.141Z · LW · GW

I know that there's no strangeness from the formal point of view. But it doesn't mean there's no strangeness in general. Or that the situation isn't similar to the Moore paradox. Your examples are not 100% Moore statements too. Isn't the point of the discussion to find interesting connections between Moore paradox and other things?

The AGI knows what you meant to do, it just cares about the different thing you accidently instilled in it, and so doesn't care about what you wanted.

I know that the classical way to formulate it is "AI knows, but doesn't care".

I thought it may be interesting to formulate it as "AI knows, but doesn't believe". It may be interesting to think for what type of AI this formulation may be true. For such AI alignment would mean resolving the Moore paradox. For example, imagine an AI with a very strong OCD to make people smile.