Posts

H5N1. Just how bad is the situation? 2023-07-08T22:09:33.928Z
Three levels of exploration and intelligence 2023-03-16T10:55:02.196Z
Maybe you can learn exotic experiences via analytical thought 2023-01-20T01:50:48.938Z
Motivated Cognition and the Multiverse of Truth 2022-11-22T12:51:26.405Z
Method of statements: an alternative to taboo 2022-11-16T10:57:49.937Z
The importance of studying subjective experience 2022-10-21T08:43:25.191Z
What if human reasoning is anti-inductive? 2022-10-11T05:15:49.373Z
Statistics for objects with shared identities 2022-10-03T09:21:15.884Z
About Q Home 2022-09-28T04:56:44.586Z
Probabilistic reasoning for description and experience 2022-09-27T10:57:06.217Z
Should AI learn human values, human norms or something else? 2022-09-17T06:19:16.482Z
Ideas of the Gaps 2022-09-13T10:55:47.772Z
Can you force a neural network to keep generalizing? 2022-09-12T10:14:27.181Z
Can "Reward Economics" solve AI Alignment? 2022-09-07T07:58:49.397Z
Informal semantics and Orders 2022-08-27T04:17:09.327Z
Vague concepts, family resemblance and cluster properties 2022-08-20T10:21:31.475Z
Statistics for vague concepts and "Colors" of places 2022-08-19T10:33:53.540Z
Q Home's Shortform 2022-08-16T23:52:08.773Z
Content generation. Where do we draw the line? 2022-08-09T10:51:37.446Z
Thinking without priors? 2022-08-02T09:17:45.622Z
Relationship between subjective experience and intelligence? 2022-07-24T09:10:10.494Z
Inward and outward steelmanning 2022-07-14T23:32:47.452Z

Comments

Comment by Q Home on Making a conservative case for alignment · 2024-11-30T04:56:12.733Z · LW · GW

Agree that neopronouns are dumb. Wikipedia says they're used by 4% LGBTQ people and criticized both within and outside the community.

But for people struggling with normal pronouns (he/she/they), I have the following thoughts:

  • Contorting language to avoid words associated with beliefs... is not easier than using the words. Don't project beliefs onto words too hard.
  • Contorting language to avoid words associated with beliefs... is still a violation of free speech (if we have such a strong notion of free speech). So what is the motivation to propose that? It's a bit like a dog in the manger. "I'd rather cripple myself than help you, let's suffer together".
  • Don't maximize free speech (in a negligible way) while ignoring every other human value.
  • In an imperfect society, truly passive tolerance (tolerance which doesn't require any words/actions) is impossible. For example, in a perfect society, if my school has bigoted teachers, it immediately gets outcompeted by a non-bigoted school. In an imperfect society it might not happen. So we get enforceable norms.

Employees get paid, which kinda automatically reduces their free speech, because saying the wrong words can make them stop getting paid. (...) Employment is really a different situation. You get laws, and recommendations of your legal department; there is not much anyone can do about that.

I'm not familiar with your model of free speech (i.e. how you imagine free speech working if laws and power balances were optimal). People who value free speech usually believe that free speech should have power above money and property, to a reasonable degree. What's "reasonable" is the crux.

I think in situations where people work together on something unrelated to their beliefs, prohibiting to enforce a code of conduct is unreasonable. Because respect is crucial for the work environment and protecting marginalized groups. I assume people who propose to "call everyone they" or "call everyone by proper name" realize some of that.

If I let people use my house as a school, but find out that a teacher openly doesn't respect minority students (by rejecting to do the smallest thing for them), I'm justified to not let the teacher into my house.

I do not talk about people's past for no good reason, and definitely not just to annoy someone else. But if I have a good reason to point out that someone did something in the past, and the only way to do that is to reveal their previous name, then I don't care about the taboo.

I just think "disliking deadnaming under most circumstances = magical thinking, like calling Italy Rome" was a very strong, barely argued/explained opinion. In tandem with mentioning delusion (Napoleon) and hysteria. If you want to write something insulting, maybe bother to clarify your opinions a little bit more? Like you did in our conversation.

Comment by Q Home on Making a conservative case for alignment · 2024-11-29T08:38:36.413Z · LW · GW

I think there should be more spaces where controversial ideas can be debated. I'm not against spaces without pronoun rules, just don't think every place should be like this. Also, if we create a space for political debate, we need to really make sure that the norms don't punish everyone who opposes centrism & the right. (Over-sensitive norms like "if you said that some opinion is transphobic you're uncivil/shaming/manipulative and should get banned" might do this.) Otherwise it's not free speech either. Will just produce another Grey or Red Tribe instead of Red/Blue/Grey debate platform.

I do think progressives underestimate free speech damage. To me it's the biggest issue with the Left. Though I don't think they're entirely wrong about free speech.

For example, imagine I have trans employees. Another employee (X) refuses to use pronouns, in principle (using pronouns is not the same as accepting progressive gender theories). Why? Maybe X thinks my trans employees live such a great lie that using pronouns is already an unacceptable concession. Or maybe X thinks that even trying to switch "he" & "she" is too much work, and I'm not justified in asking to do that work because of absolute free speech. Those opinions seem unnecessarily strong and they're at odds with the well-being of my employees, my work environment. So what now? Also, if pronouns are an unacceptable concession, why isn't calling a trans woman by her female name an unacceptable concession?

Imagine I don't believe something about a minority, so I start avoiding words which might suggest otherwise. If I don't believe that gay love can be as true as straight love, I avoid the word "love" (in reference to gay people or to anybody) at work. If I don't believe that women are as smart as men, I avoid the word "master" / "genius" (in reference to women or anybody) at work. It can get pretty silly. Will predictably cost me certain jobs.

Comment by Q Home on Making a conservative case for alignment · 2024-11-28T11:18:14.793Z · LW · GW

I'll describe my general thoughts, like you did.

I think about transness in a similar way to how I think about homo/bisexuality.

  • If homo/bisexuality is outlawed, people are gonna suffer. Bad.
  • If I could erase homo/bisexuality from existence without creating suffering, I wouldn't anyway. Would be a big violation of people's freedom to choose their identity and actions (even if in practice most people don't actually "choose" to be homo/bisexual).
  • Different people have homo/bisexuality of different "strength" and form. One man might fall in love with another man, but dislike sex or even kissing. Maybe he isn't a real homosexual, if he doesn't need to prove it physically? Another man might identify as a bisexual, but be in a relationship with a woman... he doesn't get to prove his bisexuality (sexually or romantically). Maybe we shouldn't trust him unless he walks the talk? As a result of all such situations, we might have certain "inconsistencies": some people identifying as straight have done more "gay" things than people identifying as gay. My opinion on this? I think all of this is OK. Pushing for an "objective gay test" would be dystopian and suffering-inducing. I don't think it's an empirical matter (unless we choose it to be, which is a value-laden choice). Even if it was, we might be very far away from resolving it. So just respecting people's self-identification in the meantime is best, I believe. Moreover, a lot of this is very private information anyway. Less reason to try measuring it "objectively".

My thoughts about transness specifically:

  1. We strive for gender equality (I hope). Which makes the concept of gender less important for society as a whole.
  2. The concept of gender is additionally damaged by all the things a person can decide to do in their social/sexual life. For example, take an "assigned male at birth" (AMAB) person. AMAB can appear and behave very feminine without taking hormones. Or vice-versa (take hormones, get a pair of boobs, but present masculine). Additionally there are different degrees of medical transition and different types of sexual preferences.
  3. A lot of things which make someone more or less similar to a man/woman (behavior with friends, behavior with romantic partners, behavior with sexual partners, thoughts) are private. Less reason to try measuring those "objectively".
  4. I have a choice to respect people's self-identified genders or not. I decide to respect them. Not just because I care about people's feelings, but also because of points 1 & 2 & 3 and because of my general values (I show similar respect to homo/bisexuals). So I respect pronouns, but on top of that I also respect if someone identifies as a man/woman/nonbinary. I believe respect is optimal in terms of reducing suffering and adhering to human values.

When I compare your opinion to mine, most of my confusion is about two things: what exactly do you see as an empirical question? how does the answer (or its absence) affect our actions?

Zack insists that Blanchard is right, and that I fail at rationality if I disagree with him. People on Twitter and Reddit insist that Blanchard is wrong, and that I fail at being a decent human if I disagree with them. My opinion is that I have no comparative advantage at figuring out who is right and who is wrong on this topic, or maybe everyone is wrong, anyway it is an empirical question and I don't have the data. I hope that people who have more data and better education will one day sort it out, but until that happens, my position firmly remains "I don't know (and most likely neither do you), stop bothering me".

I think we need to be careful to not make a false equivalence here:

  1. Trans people want us to respect their pronouns and genders.
  2. I'm not very familiar with Blanchard, so far it seems to me like Blanchard's work is (a) just a typology for predicting certain correlations and (b) this work is sometimes used to argue that trans people are mistaken about their identities/motivations.

2A is kinda tangential to 1. So is this really a case of competing theories? I think uncertainty should make one skeptical of Blanchard work's implications rather than make one skeptical about respecting trans people.

(Note that this is about the representatives, not the people being represented. Two trans people can have different opinions, but you are required to believe the woke one and oppose the non-woke one.) Otherwise, you are transphobic. I completely reject that.

Two homo/bisexuals can have different opinions on what's "true homo/bisexuality" is too. Some opinions can be pretty negative. Yes, that's inconvenient, but that's just an expected course of events.

Shortly: disagreement is not hate. But it often gets conflated, especially in environments that overwhelmingly contain people of one political tribe.

I feel it's just the nature of some political questions. Not in all questions, not in all spaces you can treat disagreement as something benign.

But if there is a person who actually feels dysphoria from not being addressed as "ve" (someone who would be triggered by calling them any of: "he", "she", or "they"), then I believe that this is between them and their psychiatrist, and I want to be left out of this game.

Agree. Also agree that lynching for accidental misgendering is bad.

(That's when you get the "attack helicopters" as an attempt to point out the absurdity of the system.)

I'm pretty sure the helicopter argument began as an argument against trans people, not as an argument against weird-ass novel pronouns.

Comment by Q Home on Q Home's Shortform · 2024-11-27T08:44:01.944Z · LW · GW

Draft of a future post, any feedback is welcome. Continuation of a thought from this shortform post.


(picture: https://en.wikipedia.org/wiki/Drawing_Hands)

The problem

There's an alignment-related problem: how do we make an AI care about causes of a particular sensory pattern? What are "causes" of a particular sensory pattern in the first place? You want the AI to differentiate between "putting a real strawberry on a plate" and "creating a perfect illusion of a strawberry on a plate", but what's the difference between doing real things and creating perfect illusions, in general?

(Relevant topics: environmental goals; identifying causal goal concepts from sensory data; "look where I'm pointing, not at my finger"; Pointers Problem; Eliciting Latent Knowledge; symbol grounding problem; ontology identification problem.)

I have a general answer to those questions. My answer is very unfinished. Also it isn't mathematical, it's philosophical in nature. But I believe it's important anyway. Because there's not a lot of philosophical or non-philosophical ideas about the questions above. With questions like these you don't know where to even start thinking, so it's hard to imagine even a bad answer.

Obvious observations

Observation 1. Imagine you come up with a model which perfectly predicts your sensory experience (Predictor). Just having this model is not enough to understand causes of a particular sensory pattern, i.e. differentiate between stuff like "putting a real strawberry on a plate" and "creating a perfect illusion of a strawberry on a plate".

Observation 2. Not every Predictor has variables which correspond to causes of a particular sensory pattern. Not every Predictor can be used to easily derive something corresponding to causes of a particular sensory pattern. For example, some Predictors might make predictions by simulating a large universe with a superintelligent civilization inside which predicts your sensory experiences. See "Transparent priors".


The solution

So, what are causes of a particular sensory pattern?

"Recursive Sensory Models" (RSMs).

I'll explain what an RSM is and provide various examples.

What is a Recursive Sensory Model?

An RSM is a sequence of N models (Model 1, Model 2, ..., Model N) for which the following two conditions hold true:

  • Model (K + 1) is good at predicting more aspects of sensory experience than Model (K). Model (K + 2) is good at predicting more aspects than Model (K + 1). And so on.
  • Model 1 can be transformed into any of the other models according to special transformation rules. Those rules are supposed to be simple. But I can't give a fully general description of those rules. That's one of the biggest unfinished parts of my idea.

The second bullet point is kinda the most important one, but it's very underspecified. So you can only get a feel for it through looking at specific examples.

Core claim: when the two conditions hold true, the RSM contains easily identifiable "causes" of particular sensory patterns. The two conditions are necessary and sufficient for the existence of such "causes". The universe contains "causes" of particular sensory patterns to the extent to which statistical laws describing the patterns also describe deeper laws of the universe.

Example: object permanence

Imagine you're looking at a landscape with trees, lakes and mountains. You notice that none of those objects disappear.

It seems like a good model: "most objects in the 2D space of my vision don't disappear". (Model 1)

But it's not perfect. When you close your eyes, the landscape does disappear. When you look at your feet, the landscape does disappear.

So you come up with a new model: "there is some 3D space with objects; the space and the objects are independent from my sensory experience; most of the objects don't disappear". (Model 2)

Model 2 is better at predicting the whole of your sensory experience.

However, note that the "mathematical ontology" of both models is almost identical. (Both models describe spaces whose points can be occupied by something.) They're just applied to slightly different things. That's why "recursion" is in the name of Recursive Sensory Models: an RSM reveals similarities between different layers of reality. As if reality is a fractal.

Intuitively, Model 2 describes "causes" (real trees, lakes and mountains) of sensory patterns (visions of trees, lakes and mountains).

Example: reductionism

You notice that most visible objects move smoothly (don't disappear, don't teleport).

"Most visible objects move smoothly in a 2D/3D space" is a good model for predicting sensory experience. (Model 1)

But there's a model which is even better: "visible objects consist of smaller and invisible/less visible objects (cells, molecules, atoms) which move smoothly in a 2D/3D space". (Model 2)

However, note that the mathematical ontology of both models is almost identical.

Intuitively, Model 2 describes "causes" (atoms) of sensory patterns (visible objects).

Example: a scale model

Imagine you're alone in a field with rocks of different size and a scale model of the whole environment. You've already learned object permanence.

"Objects don't move in space unless I push them" is a good model for predicting sensory experience. (Model 1)

But it has a little flaw. When you push a rock, the corresponding rock in the scale model moves too. And vice-versa.

"Objects don't move in space unless I push them; there's a simple correspondence between objects in the field and objects in the scale model" is a better model for predicting sensory experience. (Model 2)

However, note that the mathematical ontology of both models is identical.

Intuitively, Model 2 describes a "cause" (the scale model) of sensory patterns (rocks of different size being at certain positions). Though you can reverse the cause and effect here.

Example: empathy

If you put your hand on a hot stove, you quickly move the hand away. Because it's painful and you don't like pain. This is a great model (Model 1) for predicting your own movements near a hot stove.

But why do other people avoid hot stoves? If another person touches a hot stove, pain isn't instantiated in your sensory experience.

Behavior of other people can be predicted with this model: "people have similar sensory experience and preferences, inaccessible to each other". (Model 2)

However, note that the mathematical ontology of both models is identical.

Intuitively, Model 2 describes a "cause" (inaccessible sensory experience) of sensory patterns (other people avoiding hot stoves).

Counterexample: a chaotic universe

Imagine yourself in a universe where your sensory experience is produced by very simple, but very chaotic laws. Despite the chaos, your sensory experience contains some simple, relatively stable patterns. Purely by accident.

In such universe, RSMs might not find any "causes" underlying particular sensory patterns (except the simple chaotic laws).

But in such case there are probably no "causes".

Comment by Q Home on Making a conservative case for alignment · 2024-11-27T05:58:36.938Z · LW · GW

Napoleon is merely an argument for "just because you strongly believe it, even if it is a statement about you, does not necessarily make it true".

When people make arguments, they often don't list all of the premises. That's not unique to trans discourse. Informal reasoning is hard to make fully explicit. "Your argument doesn't explicitly exclude every counterexample" is a pretty cheap counter-argument. What people experience is important evidence and an important factor, it's rational to bring up instead of stopping yourself with "wait, I'm not allowed to bring that up unless I make an analytically bulletproof argument". For example, if you trust someone that they feel strongly about being a woman, there's no reason to suspect them of being a cosplayer who chases Twitter popularity.

I expect that you will disagree with a lot of this, and that's okay; I am not trying to convince you, just explaining my position.

I think I still don't understand the main conflict which bothers you. I thought it was "I'm not sure if trans people are deluded in some way (like Napoleons, but milder) or not". But now it seems like "I think some people really suffer and others just cosplay, the cosplayers take something away from true sufferers". What is taken away?

Comment by Q Home on Making a conservative case for alignment · 2024-11-26T06:06:33.338Z · LW · GW

Even if we assume that there should be a crisp physical cause of "transness" (which is already a value-laden choice), we need to make a couple of value-laden choices before concluding if "being trans" is similar to "believing you're Napoleon" or not. Without more context it's not clear why you bring up Napoleon. I assume the idea is "if gender = hormones (gender essentialism), and trans people have the right hormones, then they're not deluded". But you can arrive at the same conclusion ("trans people are not deluded") by means other than gender essentialism.

I assume that for trans people being trans is something more than mere "choice"

There doesn't need to be a crisp physical cause of "transness" for "transness" to be more than mere choice. There's a big spectrum between "immutable physical features" and "things which can be decided on a whim".

If you introduce yourself as "Jane" today, I will refer to you as "Jane". But if 50 years ago you introduced yourself as "John", that is a fact about the past. I am not saying that "you were John" as some kind of metaphysical statement, but that "everyone, including you, referred to you as John" 50 years ago, which is a statement of fact.

This just explains your word usage, but doesn't make a case that disliking deadnaming is magical thinking.

I've decided to comment because bringing up Napoleon, hysteria and magical thinking all at once is egregiously bad faith. I think it's not a good epistemic norm to imply something like "the arguments of the outgroup are completely inconsistent trash" without elaborating.

Comment by Q Home on Making a conservative case for alignment · 2024-11-25T05:59:13.577Z · LW · GW

There are people who feel strongly that they are Napoleon. If you want to convince me, you need to make a stronger case than that.

It's confusing to me that you go to "I identify as an attack helicopter" argument after treating biological sex as private information & respecting pronouns out of politeness. I thought you already realize that "choosing your gender identity" and "being deluded you're another person" are different categories.

If someone presented as male for 50 years, then changed to female, it makes sense to use "he" to refer to their first 50 years, especially if this is the pronoun everyone used at that time. Also, I will refer to them using the name they actually used at that time. (If I talk about the Ancient Rome, I don't call it Italian Republic either.) Anything else feels like magical thinking to me.

The alternative (using new pronouns / name) makes perfect sense too, due to trivial reasons, such as respecting a person's wishes. You went too far calling it magical thinking. A piece of land is different from a person in two important ways: (1) it doesn't feel anything no matter how you call it, (2) there's less strong reasons to treat it as a single entity across time.

Comment by Q Home on Evolution's selection target depends on your weighting · 2024-11-20T02:49:49.380Z · LW · GW

Meta-level comment: I don't think it's good to dismiss original arguments immediately and completely.

Object-level comment:

Neither of those claims has anything to do with humans being the “winners” of evolution.

I think it might be more complicated than that:

  1. We need to define what "a model produced by a reward function" means, otherwise the claims are meaningless. Like, if you made just a single update to the model (based on the reward function), calling it "a model produced by the reward function" is meaningless ('cause no real optimization pressure was applied). So we do need to define some goal of optimization (which determines who's a winner and who's a loser).
  2. We need to argue that the goal is sensible. I.e. somewhat similar to a goal we might use while training our AIs.

Here's some things we can try:

  • We can try defining all currently living species as winners. But is it sensible? Is it similar to a goal we would use while training our AIs? "Let's optimize our models for N timesteps and then use all surviving models regardless of any other metrics" <- I think that's not sensible, especially if you use an algorithm which can introduce random mutations into the model.
  • We can try defining species which avoided substantial changes for the longest time as winners. This seems somewhat sensible, because those species experienced the longest optimization pressure. But then humans are not the winners.
  • We can define any species which gained general intelligence as winners. Then humans are the only winners. This is sensible because of two reasons. First, with general intelligence deceptive alignment is possible: if humans knew that Simulation Gods optimize organisms for some goal, humans could focus on that goal or kill all competing organisms. Second, many humans (in our reality) value creating AGI more than solving any particular problem.

I think the later is the strongest counter-argument to "humans are not the winners".

Comment by Q Home on Q Home's Shortform · 2024-11-19T06:26:58.321Z · LW · GW

My point is that chairs and humans can be considered in a similar way.

Please explain how your point connects to my original message: are you arguing with it or supporting it or want to learn how my idea applies to something?

Comment by Q Home on Q Home's Shortform · 2024-11-19T02:23:02.566Z · LW · GW

I see. But I'm not talking about figuring out human preferences, I'm talking about finding world-models in which real objects (such as "strawberries" or "chairs") can be identified. Sorry if it wasn't clear in my original message because I mentioned "caring".

Models or real objects or things capture something that is not literally present in the world. The world contains shadows of these things, and the most straightforward way of finding models is by looking at the shadows and learning from them.

You might need to specify what you mean a little bit.

The most straightforward way of finding a world-model is just predicting your sensory input. But then you're not guaranteed to get a model in which something corresponding to "real objects" can be easily identified. That's one of the main reasons why ELK is hard, I believe: in an arbitrary world-model, "Human Simulator" can be much simpler than "Direct Translator".

So how do humans get world-models in which something corresponding to "real objects" can be easily identified? My theory is in the original message. Note that the idea is not just "predict sensory input", it has an additional twist.

Comment by Q Home on Q Home's Shortform · 2024-11-18T08:05:04.878Z · LW · GW

Creating an inhumanly good model of a human is related to formulating their preferences.

How does this relate to my idea? I'm not talking about figuring out human preferences.

Thus it's a step towards eliminating path-dependence of particular life stories

What is "path-dependence of particular life stories"?

I think things (minds, physical objects, social phenomena) should be characterized by computations that they could simulate/incarnate.

Are there other ways to characterize objects? Feels like a very general (or even fully general) framework. I believe my idea can be framed like this, too.

Comment by Q Home on Q Home's Shortform · 2024-11-17T08:35:53.238Z · LW · GW

There's an alignment-related problem, the problem of defining real objects. Relevant topics: environmental goals; task identification problem; "look where I'm pointing, not at my finger"; The Pointers Problem; Eliciting Latent Knowledge.

I think I realized how people go from caring about sensory data to caring about real objects. But I need help with figuring out how to capitalize on the idea.

So... how do humans do it?

  1. Humans create very small models for predicting very small/basic aspects of sensory input (mini-models).
  2. Humans use mini-models as puzzle pieces for building models for predicting ALL of sensory input.
  3. As a result, humans get models in which it's easy to identify "real objects" corresponding to sensory input.

For example, imagine you're just looking at ducks swimming in a lake. You notice that ducks don't suddenly disappear from your vision (permanence), their movement is continuous (continuity) and they seem to move in a 3D space (3D space). All those patterns ("permanence", "continuity" and "3D space") are useful for predicting aspects of immediate sensory input. But all those patterns are also useful for developing deeper theories of reality, such as atomic theory of matter. Because you can imagine that atoms are small things which continuously move in 3D space, similar to ducks. (This image stops working as well when you get to Quantum Mechanics, but then aspects of QM feel less "real" and less relevant for defining object.) As a result, it's easy to see how the deeper model relates to surface-level patterns.

In other words: reality contains "real objects" to the extent to which deep models of reality are similar to (models of) basic patterns in our sensory input.

Comment by Q Home on Stable Pointers to Value II: Environmental Goals · 2024-11-06T07:46:41.715Z · LW · GW

I don't understand Model-Utility Learning (MUL) section, what pathological behavior does AI do?

Since humans (or something) must be labeling the original training examples, the hypothesis that building bridges means “what humans label as building bridges” will always be at least as accurate as the intended classifier. I don’t mean “whatever humans would label”. I mean they hypothesis that “build a bridge” means specifically the physical situations which were recorded as training examples for this system in particular, and labeled by humans as such.

So it's like overfitting? If I train MUL AI to play piano in a green room, MUL AI learns that "playing piano" means "playing piano in a green room" or "playing piano in a room which would be chosen for training me in the past"?

Now, we might reasonably expect that if the AI considers a novel way of “fooling itself” which hasn’t been given in a training example, it will reject such things for the right reasons: the plan does not involve physically building a bridge.

But "sensory data being a certain way" is a physical event which happens in reality, so MUL AI might still learn to be a solipsist? MUL doesn't guarantee to solve misgeneralization in any way?

If the answer to my questions is "yes", what did we even hope for with MUL?

Comment by Q Home on Being nicer than Clippy · 2024-04-30T08:35:37.616Z · LW · GW

I'm noticing two things:

  1. It's suspicious to me that values of humans-who-like-paperclips are inherently tied to acquiring an unlimited amount of resources (no matter in which way). Maybe I don't treat such values as 100% innocent, so I'm OK keeping them in check. Though we can come up with thought experiments where the urge to get more resources is justified by something. Like, maybe instead of producing paperclips those people want to calculate Busy Beaver numbers, so they want more and more computronium for that.
  2. How consensual were the trades if their outcome is predictable and other groups of people don't agree with the outcome? Looks like coercion.
Comment by Q Home on Examples of Highly Counterfactual Discoveries? · 2024-04-24T06:00:46.888Z · LW · GW

Often I see people dismiss the things the Epicureans got right with an appeal to their lack of the scientific method, which has always seemed a bit backwards to me.

The most important thing, I think, is not even hitting the nail on the head, but knowing (i.e. really acknowledging) that a nail can be hit in multiple places. If you know that, the rest is just a matter of testing.

Comment by Q Home on Why I no longer identify as transhumanist · 2024-02-06T09:50:24.619Z · LW · GW

But avoidance of value drift or of unendorsed long term instability of one's personality is less obvious.

What if endorsed long term instability leads to negation of personal identity too? (That's something I thought about.)

Comment by Q Home on AI #27: Portents of Gemini · 2023-12-05T23:33:34.947Z · LW · GW

I think corrigibility is the ability to change a value/goal system. That the literal meaning of the term... "Correctable". If an AI were fully aligned, there would be no need to correct it.

Perhaps I should make a better argument:

It's possible that AGI is correctable, but (a) we don't know what needs to be corrected or (b) we cause new, less noticeable problems, while correcting AGI.

So, I think there's not two assumptions "alignment/interpretability is not solved + AGI is incorrigible", but only one — "alignment/interpretability is not solved". (A strong version of corrigibility counts as alignment/interpretability being solved.)

Yes, and that's the specific argument I am addressing,not AI risk in general. Except that if it's many many times smarter, it's ASI, not AGI.

I disagree that "doom" and "AGI going ASI very fast" are certain (> 90%) too.

Comment by Q Home on AI #27: Portents of Gemini · 2023-12-04T22:41:26.433Z · LW · GW

It's not aligned at every possible point in time.

I think corrigibility is "AGI doesn't try to kill everyone and doesn't try to prevent/manipulate its modification". Therefore, in some global sense such AGI is aligned at every point in time. Even if it causes a local disaster.

Over 90% , as I said

Then I agree, thank you for re-explaining your opinion. But I think other probabilities count as high too.

To me, the ingredients of danger (but not "> 90%") are those:

  • 1st. AGI can be built without Alignment/Interpretability being solved. If that's true, building AGI slowly or being able to fix visible problems may not matter that much.
  • 2nd and 3rd. AGI can have planning ability. AGI can come up with the goal pursuing which would kill everyone.
  • 2nd (alternative). AIs and AGIs can kill most humans without real intention of doing so, by destabilizing the world/amplifying already existing risks.

If I remember correctly, Eliezer also believes in "intelligence explosion" (AGI won't be just smarter than humanity, but many-many times smarter than humanity: like humanity is smarter than ants/rats/chimps). Haven't you forgot to add that assumption?

Comment by Q Home on AI #27: Portents of Gemini · 2023-12-04T02:15:15.674Z · LW · GW

why is “superintelligence + misalignment” highly conjunctive?

In the sense that matters, it needs to be fast, surreptitious, incorrigible, etc.

What opinion are you currently arguing? That the risk is below 90% or something else? What counts as "high probability" for you?

Incorrigible misalignment is at least one extra assumption.

I think "corrigible misalignment" doesn't exist, corrigble AGI is already aligned (unless AGI can kill everyone very fast by pure accident). But we can have differently defined terms. To avoid confusion, please give examples of scenarios you're thinking about. The examples can be very abstract.

If AGI is AGI, there won’t be any problems to notice

Huh?

I mean, you haven't explained what "problems" you're talking about. AGI suddenly declaring "I think killing humans is good, actually" after looking aligned for 1 year? If you didn't understand my response, a more respectful answer than "Huh?" would be to clarify your own statement. What noticeable problems did you talk about in the first place?

Please, proactively describe your opinions. Is it too hard to do? Conversation takes two people.

Comment by Q Home on AI #27: Portents of Gemini · 2023-12-03T00:10:05.933Z · LW · GW

I've confused you with people who deny that a misaligned AGI is even capable of killing most humans. Glad to be wrong about you.

But I am not saying that the doom is unlikely given superintelligence and misalignment, I am saying the argument that gets there -- superintelligence + misalignment -- is highly conjunctive. The final step., the execution as it were, is no highly conjunctive.

But I don't agree that it's highly conjunctive.

  • If AGI is possible, then its superintelligence is a given. Superintelligence isn't given only if AGI stops at human level of intelligence + can't think much faster than humans + can't integrate abilities of narrow AIs naturally. (I.e. if AGI is basically just a simulation of a human and has no natural advantages.) I think most people don't believe in such AGI.
  • I don't think misalignment is highly conjunctive.

I agree that hard takeoff is highly conjunctive, but why is "superintelligence + misalignment" highly conjunctive?

I think its needed for the "likely". Slow takeoff gives humans more time to notice and fix problems, so the likelihood of bad outcomes goes down. Wasn't that obvious?

If AGI is AGI, there won't be any problems to notice. That's why I think probability doesn't decrease enough.

...

I hope that Alignment is much easier to solve than it seems. But I'm not sure (a) how much weight to put into my own opinion and (b) how much my probability of being right decreases the risk.

Comment by Q Home on AI #27: Portents of Gemini · 2023-12-01T22:29:28.858Z · LW · GW

Yes, I probably mean something other than ">90%".

[lists of various catastrophes. many of which have nothing to do with AI]

Why are you doing this? I did not say there is zero risk of anything. (...) Are you using "risk" to mean the probability of the outcome , or the impact of the outcome?

My argument is based on comparing the phenomenon of AGI to other dangerous phenomena. The argument is intended to show that bad outcome is likely (if AGI wants to do a bad thing, it can achieve it) and that impact of the outcome can kill most humans.

I think its needed for the "likely". Slow takeoff gives humans more time to notice and fix problems, so the likelihood of bad outcomes goes down. Wasn't that obvious?

To me the likelihood doesn't go down enough (to the tolerable levels).

Comment by Q Home on AI #27: Portents of Gemini · 2023-11-27T05:01:51.965Z · LW · GW

Informal logic is more holistic than not, I think, because it relies on implicit assumptions.

It's not black and white. I don't think they are zero risk, and I don't think it is Certain Doom, so it's not what I am talking about. Why are you bringing it up? Do you think there is a simpler argument for Certain Doom?

Could you proactively describe your opinion? Or re-describe it, by adding relevant details. You seemed to say "if hard takeoff, then likely doom; but hard takeoff is unlikely, because hard takeoff requires a conjunction of things to be true". I answered that I don't think hard takeoff is required. You didn't explain that part of your opinion. Now it seems your opinion is more general (not focused on hard takeoff), but you refuse to clarify it. So, what is the actual opinion I'm supposed to argue with? I won't try to use every word against you, so feel free to write more.

Doom meaning what? It's obvious that there is some level of risk, but some level of risk isn't Certain Doom. Certain Doom is an extraordinary claim,and the burden of proof therefore is on (certain) doomers. But you seem to be switching between different definitions.

I think "AGI is possible" or "AGI can achieve extraordinary things" is the extraordinary claim. The worry about its possible extraordinary danger is natural. Therefore, I think AGI optimists bear the burden of proving that a) likely risk of AGI is bounded by something and b) AGI can't amplify already existing dangers.

By "likely doom" I mean likely (near-)extinction. "Likely" doesn't have to be 90%.

Saying “the most dangerous technology with the worst safety and the worst potential to control it” doesn't actually imply a high level of doom (p>9) or a high level of risk (> 90% dead)-- it's only a relative statement.

I think it does imply so, modulo "p > 90%". Here's a list of the most dangerous phenomena: (L1)

  • Nuclear warfare. World wars.
  • An evil and/or suicidal world-leader.
  • Deadly pandemics.
  • Crazy ideologies, e.g. fascism. Misinformation. Addictions. People being divided on everything. (Problems of people's minds.)

And a list of the most dangerous qualities: (L2)

  • Being superintelligent.
  • Wanting, planning to kill everyone.
  • Having a cult-following. Humanity being dependent on you.
  • Having direct killing power (like a deadly pandemic or a set of atomic bombs).
  • Multiplicity/simultaneity. E.g. if we had TWO suicidal world-leaders at the same time.

Things from L1 can barely scrap two points from L2, yet they can cause mass disruptions and claim many victims and also trigger each other. Narrow AI could secure three points from the list (narrow superintelligence + cult-following, dependency + multiplicity/simultaneity) — weakly, but potentially better than a powerful human ever could. However, AGI can easily secure three points from L3 in full. Four points, if AGI is developed more than in a single place. And I expect you to grant that general superintelligence presents a special, unpredictable danger.

Given that, I don't see what should bound the risk from AGI or prevent it from amplifying already existing dangers.

Comment by Q Home on AI #27: Portents of Gemini · 2023-11-26T02:11:32.707Z · LW · GW

Why ? I'm saying p(doom) is not high. I didn't mention P(otherstuff).

To be able to argue something (/decide how to go about arguing something), I need to have an idea about your overall beliefs.

That doesn't imply a high probability of mass extinction.

Could you clarify what your own opinion even is? You seem to agree that rapid self-improvement would mean likely doom. But you aren't worried about gradual self-improvement or AGI being dangerously smart without much (self-)improvement?

Comment by Q Home on AI #27: Portents of Gemini · 2023-11-25T01:09:52.357Z · LW · GW

I think I have already answered that: I don't think anyone is going to deliberately build something they can't control at all. So the probability of mass extinction depends on creating an uncontrollable superintelligence accidentally-- for instance, by rapid recursive self improvement. And RRSI , AKA Foom Doom, is a conjunction of claims, all of which are p<1, so it is not high probability.

I agree that probability mostly depends on accidental AGI. I don't agree that probability mostly depends on (very) hard takeoff. I believe probability mostly depends on just "AGI being smarter than all of humanity". If you have a kill-switch or whatever, an AGI without Alignment theory being solved is still "the most dangerous technology with the worst safety and the worst potential to control it".

So, could you go into more cruxes of your beliefs, more context? (More or less full context of my own beliefs is captured by the previous comment. But I'm ready to provide more if needed.) To provide more context to your beliefs, you could try answering "what's the worst disaster (below everyone being dead) an AGI is likely to cause" or "what's the best benefit an AGI is likely to give". To make sure you aren't treating an AGI as impotent in negative scenarios and as a messiah in positive scenarios. Or not treating humans as incapable of sinking even a safe non-sentient boat and refusing to vaccinate from viruses.

Comment by Q Home on AI #27: Portents of Gemini · 2023-11-24T02:30:01.406Z · LW · GW

I want to discuss this topic with you iff you're ready to proactively describe the cruxes of your own beliefs. I believe in likely doom and I don't think the burden of proof is on "doomers".

Maybe there just isn't a good argument for Certain Doom (or at least high probability near-extinction). I haven't seen one

What do you expect to happen when you're building uninterpretable technology without safety guarantees, smarter than all of humanity? Looks like the most dangerous technology with the worst safety and the worst potential to control it.

To me, those abstract considerations are enough a) to conclude likely doom and b) to justify common folk in blocking AI capability research — if common folk could do so.

I believe experts should have accountability (even before a disaster happens) and owe some explanation of what they're doing. If an expert is saying "I'm building the most impactful technology without safety but that's suddenly OK this time around because... ... I can't say, you need to be an expert to understand", I think it's OK to not accept the answer and block the research.

Comment by Q Home on [Bias] Restricting freedom is more harmful than it seems · 2023-11-23T07:43:38.899Z · LW · GW

You are correct that critical thinkers may want to censor uncritical thinkers. However, independent-minded thinkers do not want to censor conventional-minded thinkers.

I still don't see it. Don't see a causal mechanism that would cause it. Even if we replace "independent-minded" with "independent-minded and valuing independent-mindedness for everyone". I have the same problems with it as Ninety-Three and Raphael Harth.

To give my own example. Algorithms in social media could be a little too good at radicalizing and connecting people with crazy opinions, such as flat earth. A person censoring such algorithms/their output could be motivated by the desire to make people more independent-minded.

I deliberately avoided examples for the same reason Paul Graham's What You Can't Say deliberately avoids giving any specific examples: because either my examples would be mild and weak (and therefore poor illustrations) or they'd be so shocking (to most people) they'd derail the whole conversation. (comment)

I think the value of a general point can only stem from re-evaluating specific opinions. Therefore, sooner or later the conversation has to tackle specific opinions.

If "derailment" is impossible to avoid, then "derailment" is a part of the general point. Or there are more important points to be discussed. For example, if you can't explain to cave people General Relativity, maybe you should explain "science" and "language" first — and maybe those tangents are actually more valuable than General Relativity.

I dislike Graham's essay for the same reason: when Graham does introduce some general opinions ("morality is like fashion", "censuring is motivated by the fear of free-thinking", "there's no prize for figuring out quickly", "a statement can't be worse than false"), they're not discussed critically, with examples. Re:say looks weird to me. Invisible opponents are allowed to say only one sentence and each sentence gets a lengthy "answer" with more opinions.

Comment by Q Home on [Bias] Restricting freedom is more harmful than it seems · 2023-11-22T10:47:25.365Z · LW · GW

We only censor other people more-independent-minded than ourselves. (...) Independent-minded people do not censor conventional-minded people.

I'm not sure that's true. Not sure I can interpret the "independent/dependent" distinction.

  • In "weirdos/normies" case, a weirdo can want to censor ideas of normies. For example, some weirdos in my country want to censor LGBTQ+ stuff. They already do.
  • In "critical thinkers/uncritical thinkers" case, people with more critical thinking may want to censor uncritical thinkers. (I believe so.) For example, LW in particular has a couple of ways to censor someone, direct and indirect.

In general, I like your approach of writing this post like an "informal theorem".

Comment by Q Home on It's OK to be biased towards humans · 2023-11-20T10:16:49.738Z · LW · GW

I tried to describe necessary conditions which are needed for society and culture to exist. Do you agree that what I've described are necessary conditions?

I realize I'm pretty unusual in the regard, which may be biasing my views. However, I think I am possibly evidence against the notion that a desire to leave a mark on the culture is fundamental to human identity

Relevant part of my argument was "if your personality gets limitlessly copied and modified, your personality doesn't exist (in the cultural sense)". You're talking about something different, you're talking about ambitions and desire of fame.


My thesis (to not lose the thread of the conversation):

If human culture and society are natural, then the rights about information are natural too, because culture/society can't exist without them.

Comment by Q Home on It's OK to be biased towards humans · 2023-11-20T08:38:37.190Z · LW · GW

I think we can just judge by the consequences (here "consequences" don't have to refer to utility calculus). If some way of "injecting" art into culture is too disruptive, we can decide to not allow it. Doesn't matter who or how makes the injection.

Comment by Q Home on It's OK to be biased towards humans · 2023-11-20T08:28:54.118Z · LW · GW

To exist — not only for itself, but for others — a consciousness needs a way to leave an imprint on the world. An imprint which could be recognized as conscious. Similar thing with personality. For any kind of personality to exist, that personality should be able to leave an imprint on the world. An imprint which could be recognized as belonging to an individual.

Uncontrollable content generation can, in principle, undermine the possibility of consciousness to be "visible" and undermine the possibility of any kind of personality/individuality. And without those things we can't have any culture or society expect a hivemind.

Are you OK with such disintegration of culture and society?

In general, I think people have a right to hear other people, but not a right to be heard.

To me that's very repugnant, if taken to the absolute. What emotions and values motivate this conclusion? My own conclusions are motivated by caring about culture and society.


Alternatively, it could be the case that the artist has more to say that isn't or can't be expressed by the imitations- other ideas, interesting self expression, and so on- but the imitations prevent people from finding that new work. I think that case is a failure of whatever means people are using to filter and find art. A good social media algorithm or friend group who recommend content to each other should recognize that the inventor of an good idea might invent other good ideas in the future, and should keep an eye out for and platform those ideas if they do.

I was going for something slightly more subtle. Self-expression is about making a choice. If all choices are realized before you have a chance to make them, your ability to express yourself is undermined.

Comment by Q Home on It's OK to be biased towards humans · 2023-11-20T00:56:35.991Z · LW · GW

Thank you for the answer, clarifies your opinion a lot!

Artistic expression, of course, is something very different. I'm definitely going to keep making art in my spare time for the rest of my life, for the sake of fun and because there are ideas I really want to get out. That's not threatened at all by AI.

I think there are some threats, at least hypothetical. For example, the "spam attack". People see that a painter starts to explore some very niche topic — and thousands of people start to generate thousands of paintings about the same very niche topic. And the very niche topic gets "pruned" in a matter of days, long before the painter has said at least 30% of what they have to say. The painter has to fade into obscurity or radically reinvent themselves after every couple of paintings. (Pre-AI the "spam attack" is not really possible even if you have zero copyright laws.)

In general, I believe for culture to exist we need to respect the idea "there's a certain kind of output I can get only from a certain person, even if it means waiting or not having every single of my desires fulfilled" in some way. For example, maybe you shouldn't use AI to "steal" a face of an actor and make them play whatever you want.

Do you think that unethical ways to produce content exist at least in principle? Would you consider any boundary for content production, codified or not, to be a zero-sum competition?

Comment by Q Home on It's OK to be biased towards humans · 2023-11-19T10:03:00.041Z · LW · GW

Maybe I've misunderstood your reply, but I wanted to say that hypothetically even humans can produce art in non-cooperative and disruptive ways, without breaking existing laws.

Imagine a silly hypothetical: one of the best human artists gets a time machine and starts offering their art for free. That artist functions like an image generator. Is such an artist doing something morally questionable? I would say yes.

Comment by Q Home on It's OK to be biased towards humans · 2023-11-19T09:31:13.686Z · LW · GW

Could you explain your attitudes towards art and art culture more in depth and explain how exactly your opinions on AI art follow from those attitudes? For example, how much do you enjoy making art and how conditional is that enjoyment? How much do you care about self-expression, in what way? I'm asking because this analogy jumped out at me as a little suspicious:

And as terrible as this could be for my career, spending my life working in a job that could be automated but isn't would be as soul-crushing as being paid to dig holes and fill them in again. It would be an insultingly transparent facsimile of useful work.

But creative work is not mechanical work, it can't be automated that way, AI doesn't replace you that way. AI doesn't have the model of your brain, it can't make the choices you would make. It replaces you by making something cheaper and on the same level of "quality". It doesn't automate your self-expression. If you care about self-expression, the possibility of AI doesn't have to feel soul-crushing.

I apologize for sounding confrontational. You're free to disagree with everything above. I just wanted to show that the question has a lot of potential nuances.

Comment by Q Home on It's OK to be biased towards humans · 2023-11-19T08:51:22.433Z · LW · GW

I like the angle you've explored. Humans are allowed to care about humans — and propagate that caring beyond its most direct implications. We're allowed to care not only about humans' survival, but also about human art and human communication and so on.

But I think another angle is also relevant: there are just cooperative and non-cooperative ways to create art (or any other output). If AI creates art in non-cooperative ways, it doesn't matter how the algorithm works or if it's sentient or not.

Comment by Q Home on It's OK to be biased towards humans · 2023-11-12T07:21:47.626Z · LW · GW

Thus, it doesn't matter in the least if it stifles human output, because the overwhelming majority of us who don't rely on our artistic talent to make a living will benefit from a post-scarcity situation for good art, as customized and niche as we care to demand.

How do you know that? Art is one of the biggest outlets of human potential; one of the biggest forces behind human culture and human communities; one of the biggest communication channels between people.

One doesn't need to be a professional artist to care about all that.

Comment by Q Home on Open Thread – Autumn 2023 · 2023-11-06T11:10:52.819Z · LW · GW

I think you're going for the most trivial interpretation instead of trying to explore interesting/unique aspects of the setup. (Not implying any blame. And those "interesting" aspects may not actually exist.) I'm not good at math, but not that bad to not know the most basic 101 idea of multiplying utilities by probabilities.

I'm trying to construct a situation (X) where the normal logic of probability breaks down, because each possibility is embodied by a real person and all those persons are in a conflict with each other.

Maybe it's impossible to construct such situation, for example because any normal situation can be modeled the same way (different people in different worlds who don't care about each other or even hate each other). But the possibility of such situation is an interesting topic we could explore.

Here's another attempt to construct "situation X":

  • We have 100 persons.
  • 1 person has 99% chance to get big reward and 1% chance to get nothing. If they drink.
  • 99 persons each have 0.0001% chance to get big punishment and 99.9999% chance to get nothing.

Should a person drink? The answer "yes" is a policy which will always lead to exploiting 99 persons for the sake of 1 person. If all those persons hate each other, their implicit agreement to such policy seems strange.


Here's an explanation of what I'd like to explore from another angle.

Imagine I have a 99% chance to get reward and 1% chance to get punishment. If I take a pill. I'll take the pill. If we imagine that each possibility is a separate person, this decision can be interpreted in two ways:

  • 1 person altruistically sacrifices their well-being for the sake of 99 other persons.
  • 100 persons each think, egoistically, "I can get lucky". Only 1 person is mistaken.

And the same is true for other situations involving probability. But is there any situation (X) which could differentiate between "altruistic" and "egoistic" interpretations?

Comment by Q Home on Open Thread – Autumn 2023 · 2023-11-05T23:01:07.767Z · LW · GW

For all intents and purposes it's equivalent to say "you have only one shot" and after memory erasure it's not you anymore, but a person equivalent to other version of you next room.

Let's assume "it's not you anymore" is false. At least for a moment (even if it goes against LDT or something else).

Yes, you have a 0.1 chance of being punished. But who cares if they will erase your memory anyway.

Let's assume that the persons do care.

Comment by Q Home on Petrov Day Retrospective, 2023 (re: the most important virtue of Petrov Day & unilaterally promoting it) · 2023-09-28T06:52:16.776Z · LW · GW

To me, the initial poll options make no sense without each other. For example, "avoid danger" and "communicate beliefs" don't make sense without each other [in context of society].

If people can't communicate (report epistemic state), "avoid danger" may not help or be based on 100% biased opinions on what's dangerous.

  • If some people solve Alignment, but don't communicate, humanity may perish due to not building a safe AGI.
  • If nobody solves Alignment, but nobody communicates about Alignment, humanity may perish because careless actors build an unsafe AGI without even knowing they do something dangerous.

I like communication, so I chose the second option. Even though "communicating without avoiding danger" doesn't make sense either.

Since the poll options didn't make much sense to me, I didn't see myself as "facing alien values" or "fighting off babyeaters". I didn't press the link, because I thought it may "blow up" the site (similar to the previous Petrov's Day) + I wasn't sure it's OK to click, I didn't think my unilateralism would be analogous to Petrov's unilateralism (did Petrov cure anyone's values, by the way?). I decided it's more Petrov-like to not click.


But is AGI (or anything else) related to the lessons of Petrov's Day? That's another can of worms. I think we should update the lessons of the past to fit the future situations. I think it doesn't make much sense to take away from Petrov's Day only lessons about "how to deal with launching nukes".

Another consideration: Petrov did accurately report his epistemic state. Or would have, if it were needed (if it were needed, he would lie to accurately report his epistemic state - "there are no launches"). Or "he accurately non-reported the non-presence of nuclear missiles".

Comment by Q Home on A Case for AI Safety via Law · 2023-09-22T02:31:46.640Z · LW · GW

Maybe you should edit the post to add something like this:

My proposal is not about the hardest parts of the Alignment problem. My proposal is not trying to solve theoretical problems with Inner Alignment or Outer Alignment (Goodhart, loopholes). I'm just assuming those problems won't be relevant enough. Or humanity simply won't create anything AGI-like (see CAIS).

Instead of discussing the usual problems in Alignment theory, I merely argue X. X is not a universally accepted claim, here's evidence that it's not universally accepted: [write the evidence here].

...

By focusing on the external legal system, many key problems associated with alignment (as recited in the Summary of Argument) are addressed. One worth highlighting is 4.4, which suggests AISVL can assure alignment in perpetuity despite changes in values, environmental conditions, and technologies, i.e., a practical implementation of Yudkowsky's CEV.

I think the key problems are not "addressed", you just assume they won't exist. And laws are not a "practical implementation of CEV".

Comment by Q Home on A Case for AI Safety via Law · 2023-09-20T22:19:58.930Z · LW · GW

Maybe there's a misunderstanding. Premise (1) makes sure that your proposal is different from any other proposal. It's impossible to reject premise (1) without losing the proposal's meaning.

Premise (1) is possible to reject only if you're not solving Alignment but solving some other problem.

I'm arguing for open, external, effective legal systems as the key to AI alignment and safety. I see the implementation/instilling details as secondary. My usage refers to specifying rules/laws/ethics externally so they are available and usable by all intelligent systems.

If an AI can be Aligned externally, then it's already safe enough. It feels like...

  • You're not talking about solving Alignment, but talking about some different problem. And I'm not sure what that problem is.
  • For your proposal to work, the problem needs to be already solved. All the hard/interesting parts need to be already solved.
Comment by Q Home on A Case for AI Safety via Law · 2023-09-20T09:37:30.894Z · LW · GW

Perhaps the most important and (hopefully) actionable recommendation of the proposal is in the conclusion:

"For the future safety and wellbeing of all sentient systems, work should occur in earnest to improve legal processes and laws so they are more robust, fair, nimble, efficient, consistent, understandable, accepted, and complied with." (comment)

Sorry for sounding harsh. But to say something meaningful, I believe you have to argue two things:

  • Laws are distinct enough from human values (1), but following laws / caring about laws / reporting about predicted law violations prevents the violation of human values (2).

I think the post fails to argue both points. I see no argument that instilling laws is distinct enough from instilling values/corrigibility/human semantics in general (1) and that laws actually prevent misalignment (2).

Later I write, "Suggested improvements to law and legal process are mostly beyond the scope of this brief. It is possible, however, that significant technological advances will not be needed for implementing some key capabilities. For example, current Large Language Models are nearly capable of understanding vast legal corpora and making appropriate legal decisions for humans and AI systems (Katz et al., 2023). Thus, a wholesale switch to novel legal encodings (e.g., computational and smart contracts) may not be necessary."

If AI can be just asked to follow your clever idea, then AI is already safe enough without your clever idea. "Asking AI to follow something" is not what Bostrom means by direct specification, as far as I understand.

Comment by Q Home on Which Questions Are Anthropic Questions? · 2023-09-18T13:58:51.719Z · LW · GW

I like how you explain your opinion, very clear and short, basically contained in a single bit of information: "you're not a random sample" or "this equivalence between 2 classes of problems can be wrong".

But I think you should focus on describing the opinion of others (in simple/new ways) too. Otherwise you're just repeating yourself over and over.

If you're interested, I could try helping to write a simplified guide to ideas about anthropics.

Comment by Q Home on Some Thoughts on AI Art · 2023-09-18T13:54:05.360Z · LW · GW

Additionally, this view ignores art consumers, who out-number artists by several orders of magnitude. It seems unfair to orient so much of the discussion of AI art's effects on the smaller group of people who currently create art.

What is the greater framework behind this argument? "Creating art" is one of the most general potentials a human being can realize. With your argument we could justify chopping off every human potential because "there's a greater amount of people who don't care about realizing it".

I think deleting a key human potential (and a shared cultural context) affects the entire society.

Comment by Q Home on Open Thread – Autumn 2023 · 2023-09-15T09:31:24.164Z · LW · GW

A stupid question about anthropics and [logical] decision theories. Could we "disprove" some types of anthropic reasoning based on [logical] consistency? I struggle with math, so please keep the replies relatively simple.

  • Imagine 100 versions of me, I'm one of them. We're all egoists, each one of us doesn't care about the others.
  • We're in isolated rooms, each room has a drink. 90 drinks are rewards, 10 drink are punishments. Everyone is given the choice to drink or not to drink.
  • The setup is iterated (with memory erasure), everyone gets the same type of drink each time. If you got the reward, you get the reward each time. Only you can't remember that.

If I reason myself into drinking (reasoning that I have a 90% chance of reward), from the outside it would look as if 10 egoists have agreed (very conveniently, to the benefit of others) to suffer again and again... is it a consistent possibility?

Comment by Q Home on Why am I Me? · 2023-09-08T05:32:21.213Z · LW · GW

Let's look at actual outcomes here. If every human says yes, 95% of them get to the afterlife. If every human says no, 5% of them get to the afterlife. So it seems better to say yes in this case, unless you have access to more information about the world than is specified in this problem. But if you accept that it's better to say yes here, then you've basically accepted the doomsday argument.

There's a chance you're changing the nature of the situation by introducing Omega. Often "beliefs" and "betting strategy" go together, but here it may not be the case. You have to prove that the decision in the Omega game has any relation to any other decisions.

There's a chance this Omega game is only "an additional layer of tautology" which doesn't justify anything. We need to consider more games. I can suggest a couple of examples.

Game 1:

Omega: There are 2 worlds, one is much more populated than another. In the bigger one magic exists, in the smaller one it doesn't. Would you bet that magic exists in your world? Would you actually update your beliefs and keep that update?

One person can argue it becomes beneficial to "lie" about your beliefs/adopt temporal doublethink. Another person can argue for permanently changing your mind about magic.

Game 2:

Omega: I have this protocol. When you stand on top of a cliff, I give you a choice to jump or not. If you jump, you die. If you don't, I create many perfect simulations of this situation. If you jump in a simulation, you get a reward. Wanna jump?

You can argue "jumping means death, the reward is impossible to get". Unless you have access to true randomness which can vary across perfect copies of the situation. IDK. Maybe "making the Doomsday update beneficially" is impossible.

You did touch on exactly that, so I'm not sure how much my comment agrees with your opinions.

Comment by Q Home on H5N1. Just how bad is the situation? · 2023-07-09T11:40:46.478Z · LW · GW

The real question is will H5N1 pandemic happen in the next 5-10 years

2.4%

Sorry for a dumb question, but where do those numbers come from? What reasoning stands behind them? Is it some causal story ("jumping to humans is not that easy"), or priors ("pandemics are unlikely") or some precedent analysis ("it's not the first time a virus infects so much animal types")?

I really lack knowledge about viruses.

Comment by Q Home on H5N1. Just how bad is the situation? · 2023-07-09T02:00:07.217Z · LW · GW

What exactly, in our rational consideration, keeps the risk relatively low? Is it a prior that calamity-level pandemics happen rarely? Is it the fact (?) that today's situation is not that unique? Is it the hope that the virus can "back down", somehow? Is it some fact about general behavior of viruses?

What are the "cruxes" of "the risk is relatively low" prediction, what events would increase/decrease the risk and how much? For example, what happens with the probability if a lot of mammal-to-mammal transmissions start happening? Maybe I've missed it, but Zvi doesn't seem to address such points. I feel utterly confused. As if I'm missing an obvious piece of context which "nobody is talking about".

I have little knowledge about viruses. How unique is it for a virus to be deadly (and already a deadly threat for humans), epizootic (epidemic in non-humans) and panzootic (affecting animals of many species, especially over a wide area)? (From wikipedia article.)

The most naive, over-reactive and highly likely misinformed take would be "we are in a unique situation in history (in terms of viruses), more unique than Spanish flu and Black Death, because the latter weren't (?) widespread among non-humans. there are some dice rolls which separate us from disaster, but all possible dice rolls are now happening daily for days and months (and years)." ... What makes all the factors cash out into "anyway, the risk is relatively low, just one digit"? Here's an analogy: from a naive outside perspective, H5N1's "progress" may seem as impressive as ChatGPT. "This never (?) happened, but suddenly it happened and from this point on things can only escalate (probably)" - I guess for an outsider it's easy to get an impression like this. I feel confused because I'm not seeing it directly addressed.

Comment by Q Home on Ideas of the Gaps · 2023-07-04T05:16:00.375Z · LW · GW

So they overlook the simpler patterns because they pay less rent upfront, even though they are more general and a better investment long-term.

...

And if you use this metaphor to imagine what's going to happen to a tiny drop of water on a plastic table, you could predict that it will form a ball and refuse to spread out. While the metaphor may only be able to generate very uncertain & imprecise predictions, it's also more general.

Can you expand on the this thought ("something can give less specific predictions, but be more general") or reference famous/professional people discussing it? This thought can be very trivial, but it also can be very controversial.

Right now I'm writing a post about "informal simplicity", "conceptual simplicity". It discusses simplicity of informal concepts (concepts not giving specific predictions). I make an argument that "informal simplicity" should be very important a priori. But I don't know if "informal simplicity" was used (at least implicitly) by professional and famous people. Here's as much as I know: (warning, controversial and potentially inaccurate takes!)

  • Zeno of Elea made arguments basically equivalent to "calculus should exist" and "theory of computation should exist" ("supertasks are a thing") using only the basic math.

  • The success of neural networks is a success of one of the simplest mechanisms: backpropagation and attention. (Even though they can be heavy on math too.) We observed a complicated phenomenon (real neurons), we simplified it... and BOOM!

  • Arguably, many breakthroughs in early and late science were sealed behind simple considerations (e.g. equivalence principle), not deduced from formal reasoning. Feynman diagram weren't deduced from some specific math, they came from the desire to simplify.

  • Some fields "simplify each other" in some way. Physics "simplifies" math (via physical intuitions). Computability theory "simplifies" math (by limiting it to things which can be done by series of steps). Rationality "simplifies" philosophy (by connecting it to practical concerns) and science.

  • To learn flying, Wright brothers had to analyze "simple" considerations.

  • Eliezer Yudkowsky influenced many people with very "simple" arguments. Rational community as a whole is a "simplified" approach to philosophy and science (to a degree).

  • The possibility of a logical decision theory can be deduced from simple informal considerations.

  • Albert Einstein used simple thought experiments.

  • Judging by the famous video interview, Richard Feynman likes to think about simple informal descriptions of physical processes. And maybe Feynman talked about "less precise, but more general" idea? Maybe he said that epicycles were more precise, but a heliocentric model was better anyway? I couldn't find it.

  • Terry Tao occasionally likes to simplify things. (e.g. P=NP and multiple choice exams, Quantum mechanics and Tomb Raider, Special relativity and Middle-Earth and Calculus as “special deals”). Is there more?

  • Some famous scientists weren't shying away from philosophy (e.g. Albert Einstein, Niels Bohr?, Erwin Schrödinger).

Please, share any thoughts or information relevant to this, if you have any! It's OK if you write your own speculations/frames.

Comment by Q Home on Three levels of exploration and intelligence · 2023-03-16T22:17:14.668Z · LW · GW

If you have a flexible enough representation then you can use it to represent anything, unfortunately you've also gutted it of predictive power (vs post hoc explanation).

I think this can be wrong:

  1. "Y" and "D" are not empty symbols, they come with an objective enough metric (the metric of "general importance"). So, it's like saying that "A" and "B" in the Bayes' theorem are empty symbols without predictive power. And I believe the analogy with Bayes' theorem is not accidental, by the way, because I think you could turn my idea into a probabilistic inference rule.
  2. If my method can't help to predict good ideas, it still can have predictive power if it evaluates good ideas correctly (before they get universally recognized as good). Not every important idea is immediately recognized as important.

Can you expand on the connection with Leverage Points? Seems like 12 Leverage Points is an extremely specific and complicated idea (doesn't mean it can't be good in its own field, though).

Comment by Q Home on Q Home's Shortform · 2023-03-08T22:26:45.614Z · LW · GW

*A more "formal" version of the draft (it's a work in progress): *

There are two interpretations of this post, weak and strong.

Weak interpretation:

I describe a framework about "thee levels of exploration". I use the framework to introduce some of my ideas. I hope that the framework will give more context to my ideas, making them more understandable. I simply want to find people who are interested in exploring ideas. Exploring just for the sake of exploring or for a specific goal.

Strong interpretation:

I use the framework as a model of intelligence. I claim that any property of intelligence boils down to the "three levels of exploration". Any talent, any skill. The model is supposed to be "self-evident" because of its simplicity, it's not based on direct analysis of famous smart people.

Take the strong interpretation with a lot of grains of salt, of course, because I'm not an established thinker and I haven't achieved anything intellectual. I just thought "hey, this is a funny little simple idea, what if all intelligence works like this?", that's all.

That said, I'll need to make a couple of extraordinary claims "from inside the framework" (i.e. assuming it's 100% correct and 100% useful). Just because that's in the spirit of the idea. Just because it allows to explore the idea to its logical conclusion. Definitely not because I'm a crazy man. You can treat the most outlandish claims as sci-fi ideas.

A formula of thinking?

Can you "reduce" thinking to a single formula? (Sounds like cringe and crackpottery!)

Can you show a single path of the best and fastest thinking?

Well, there's an entire class of ideas which attempt to do this in different fields, especially the first idea:

My idea is just another attempt at reduction. You don't have to treat such attempts 100% seriously in order to find value in them. You don't have to agree with them.

Three levels of exploration

Let's introduce my framework.

In any topic, there are three levels of exploration:

  1. You study a single X.
  2. You study types of different X. Often I call those types "qualities" of X.
  3. You study types of changes (D): in what ways different X change/get changed by a new thing Y. Y and D need to be important even outside of the (main) context of X.

The point is that at the 2nd level you study similarities between different X directly, but at the 3rd level you study similarities indirectly through new concepts Y and D. The letter "D" means "dynamics".

I claim that any property of intelligence can be boiled down to your "exploration level". Any talent, any skill and even more vague things such as "level of intentionality". I claim that the best and most likely ideas come from the 3rd level. That 3rd level defines the absolute limit of currently conceivable ideas. So, it also indirectly defines the limit of possible/conceivable properties of reality.

You don't need to trust those extraordinary claims. If the 3rd level simply sounds interesting enough to you and you're ready to explore it, that's good enough.

Three levels simplified

A vague description of the three levels:

  1. You study objects.
  2. You study qualities of objects.
  3. You study changes of objects.

Or:

  1. You study a particular thing.
  2. You study everything.
  3. You study abstract ways (D) in which the thing is changed by "everything".

Or:

  1. You study a particular thing.
  2. You study everything.
  3. You study everything through a particular thing.

So yeah, it's a Hegelian dialectic rip-off. Down below are examples of applying my framework to different topics. You don't need to read them all, of course.


Exploring debates

1. Argumentation

I think there are three levels of exploring arguments:

  1. You judge arguments as right or wrong. Smart or stupid.
  2. You study types of arguments. Without judgement.
  3. You study types of changes (D): how arguments change/get changed by some new thing Y. ("dynamics" of arguments)

If you want to get a real insight about argumentation, you need to study how (D) arguments change/get changed by some new thing Y. D and Y need to be important even outside of the context of explicit argumentation.

For example, Y can be "concepts". And D can be "connecting/separating" (a fundamental process which is important in a ton of contexts). You can study in what ways arguments connect and separate concepts.

A simplified political example: a capitalist can tend to separate concepts ("bad things are caused by mistakes and bad actors"), while a socialist can tend to connect concepts ("bad things are caused by systemic problems"). Conflict Vs. Mistake^(1) is just a very particular version of this dynamic. Different manipulations with concepts create different arguments and different points of view. You can study all such dynamics. You can trace arguments back to fundamental concept manipulations. It's such a basic idea and yet nobody has done it. Aristotle has done it 2400 years ago, but for formal logic.

^(1. I don't agree with Scott Alexander, by the way.)

Arguments: conclusion

I think most of us are at the level 1 in argumentation: we throw arguments at each other like angry cavemen without studying what an "argument" is and/or what dynamics it creates. If you completely unironically think that "stupid arguments" exist, then you're probably on the 1st level. Professional philosophers are at the level 2 at best, but usually lower (they are surprisingly judgemental). At least they are somewhat forced to be tolerant to the most diverse types of arguments due to their profession.

On what level are you? Have you studied arguments without judgement?

2. Understanding/empathy

I think there are three levels in understanding your opponent:

  1. You study a specific description (X) of your opponent's opinion. You can pass the Ideological Turing Test in a superficial way. Like a parrot.
  2. You study types of descriptions of your opponent's opinion. ("Qualities" of your opponent's opinion.) You can "inhabit" the emotions/mindset of your opponent.
  3. You study types of changes (D): how the description of your opponent's opinion changes/get changed by some new thing Y. D and Y need to be important even outside of debates.

For example, Y can be "copies of the same thing" and D can be "transformations of copies into each other". Such Y and D are important even outside of debates.

So, on the 3rd level you may be able to describe the opponent's position as a weaker version/copy of your own position (Y) and clearly imagine how your position could turn out to be "the weaker version/copy" of the opponent's views. You can imagine how opponent's opinion transforms into truth and your opinion transforms into a falsehood (D).

Other interesting choices of Y and D are possible. For example, Y can be "complexity of the opinion [in a given context]"; D can be "choice of the context" and "increasing/decreasing of complexity". You can run the opinion of your opponent through different contexts and see how much it reacts to/accommodates the complexity of the world.

Empathy: conclusion

I think people very rarely do the 3rd level of empathy.

Doing it systematically would lead to a new political/epistemological paradigm.


Exploring philosophy

1. Beliefs and ontology

I think there are three levels of studying the connection between beliefs and ontology:

  1. You think you can see the truth of a belief directly. For example, you can say "all beliefs which describe reality in a literal way are true". You get stuff like Naïve Realism. "Reality is real."
  2. You study types of beliefs. You can say that all beliefs of a certain type are true. For example, "all mathematical beliefs are true". You get stuff like Mathematical Universe Hypothesis, Platonisim, Ontic Structural Realism... "Some description of reality is real."
  3. You study types of changes (D): how beliefs change/get changed by some new thing Y. You get stuff like Berkeley’s subjective idealism and radical probabilism and Bayesian epistemology: the world of changing ideas. "Some changing description of reality is real."

What can D and Y be? Both things need to be important even outside of the context of explicit beliefs. A couple of versions:

  • Y can be "semantic connections". D can be "connecting/separating [semantic connections]". Both things are generally important, for example in linguistics, in studying semantic change. We get Berkeley's idealism.
  • Y can be "probability mass" or some abstract "weight". D can be "distribution of the mass/weight". We get probabilism/Bayesianism.

Thinking at the level of semantic connections should be natural to people, because they use natural language and... neural nets in their brains! (Berkeley makes a similar argument: "hey, folks, this is just common sense!") And yet this idea is extremely alien to people epistemology-wise and ontology-wise. I think the true potential of the 3rd level remains unexplored.

Beliefs: conclusion

I think most rationalists (Bayesians, LessWrong people) are "confused" between the 2nd level and the 1st level, even though they have some 3rd level tools.

Eliezer Yudkowsky is "confused" between the 1st level and the 3rd level: he likes level 1 ideas (e.g. "map is not the territory"), but has a bunch of level 3 ideas ("some maps are the territory") about math, probability, ethics, decision theory, Security Mindset...

2. Ontology and reality

I think there are three level of exploring the relationship between ontologies and reality:

  1. You think that an ontology describes the essence of reality.
  2. You study how different ontologies describe different aspects of reality.
  3. You study types of changes (D): how ontologies change/get changed by some other concept Y. D and Y need to be important even outside of the topic of (pure) ontology.

Y can be "human minds" or simply "objects". D can be "matching/not matching" or "creating a structure" (two very basic, but generally important processes). You get Kant's "Copernican revolution" (reality needs to match your basic ontology, otherwise information won't reach your mind: but there are different types of "matching" and transcendental idealism defines one of the most complicated ones) and Ontic Structural Realism (ontology is not about things, it's about structures created by things) respectively.

On what level are you? Have you studied ontologies/epistemologies without judgement? What are the most interesting ontologies/epistemologies you can think of?

3. Philosophy overall

I think there are three levels of doing philosophy in general:

  1. You try to directly prove an idea in philosophy using specific philosophical tools.
  2. You study types of philosophical ideas.
  3. You study types of changes (D): how philosophical ideas change/get changed by some other thing Y. D and Y need to be important even outside of (pure) philosophy.

To give a bunch of examples, Y can be:

I think people did a lot of 3rd level philosophy, but we haven't fully committed to the 3rd level yet. We are used to treating philosophy as a closed system, even when we make significant steps outside of that paradigm.


Exploring ethics

1. Commitment to values

I think there are three levels of values:

  1. Real values. You treat your values as particular objects in reality.

  2. Subjective values. You care only about things inside of your mind. For example, do you feel good or not?

  3. Semantic values. You care about types of changes (D): how your values change/get changed by reality (Y). Your value can be expressed as a combination of the three components: "a real thing + its meaning + changes".

Example of a semantic value: you care about your friendship with someone. You will try to preserve the friendship. But in a limited way: you're ready that one day the relationship may end naturally (your value may "die" a natural death). Semantic values are temporal and path-dependent. Semantic values are like games embedded in reality: you want to win the game without breaking the rules.

2. Ethics

I think there are three levels of analyzing ethics:

  1. You analyze norms of specific communities and desires of specific people. That's quite easy: you are just learning facts.
  2. You analyze types of norms and desires. You are lost in contradictory implications, interpretations and generalizations of people's values. You have a meta-ethical paralysis.
  3. You study types of changes (D): how norms and desires change/get changed by some other thing Y. D and Y need to be important even outside of (purely) ethical context.

Ethics: tasks and games

For example, Y can be "tasks, games, activities" and D can be "breaking/creating symmetries". You can study how norms and desires affect properties of particular activities.

Let's imagine an Artificial Intelligence or a genie who fulfills our requests (it's a "game" between us). We can analyze how bad actions of the genie can break important symmetries of the game. Let's say we asked it to make us a cup of coffee:

  • If it killed us after making the coffee, we can't continue the game. And we ended up with less than we had before. And we wouldn't make the request if we knew that's gonna happen. And the game can't be "reversed": the players are dead.

  • If it has taken us under mind control, we can't affect the game anymore (and it gained 100% control over the game). If it placed us into a delusion, then the state of the game can be arbitrarily affected (by dissolving the illusion). And depends on perspective.

  • If it made us addicted to coffee, we can't stop or change the game anymore. And the AI/genie drastically changed the nature of the game without our consent. It changed how the "coffee game" relates to all other games, skewed the "hierarchy of games".

Those are all "symmetry breaks". And such symmetry breaks are bad in most of the tasks.

Ethics: Categorical Imperative

With Categorical Imperative, Kant explored a different choice of Y and D. Now Y is "roles of people", "society" and "concepts"; D is "universalization" and "becoming incoherent/coherent" and other things.

Ethics: Preferences

If Y is "preferences" and D is "averaging", we get Preference utilitarianism. (Preferences are important even outside of ethics and "averaging" is important everywhere.) But this idea is too "low-level" to use in analysis of ethics.

However, if Y is "versions of an abstract preference" and D is "splitting a preference into versions" and "averaging", then we get a high-level analog of preference utilitarianism. For example, you can take an abstract value such as Bodily autonomy and try to analyze the entirety of human ethics as an average of versions (specifications) of this abstract value.

Preference utilitarianism reduces ethics to an average of micro-values, the idea above reduces ethics to an average of a macro-value.

Ethics: conclusion

So, what's the point of the 3rd level of analyzing ethics? The point is to find objective sub-structures in ethics where you can apply deduction to exclude the most "obviously awful" and "maximally controversial and irreversible" actions. The point is to "derive" ethics from much more broad topics, such as "meaningful games" and "meaningful tasks" and "coherence of concepts".

I think:

  • Moral philosophers and Alignment researches are ignoring the 3rd level. People are severely underestimating how much they know about ethics.
  • Acknowledging the 3rd level doesn't immediately solve Alignment, but it can "solve" ethics or the discourse around ethics. Empirically: just study properties of tasks and games and concepts!
  • Eliezer Yudkowsky has limited 3rd level understanding of meta-ethics ("Abstracted Idealized Dynamics", "Morality as Fixed Computation", "The Bedrock of Fairness") but misses that he could make his idea more broad.
  • Particularism (in ethics and reasoning in general) could lead to the 3rd level understanding of ethics.

Exploring perception

1. Properties

There are three levels of looking at properties of objects:

  1. Inherent properties. You treat objects as having more or less inherent properties. E.g. "this person is inherently smart"

  2. Meta-properties. You treat any property as universal. E.g. "anyone is smart under some definition of smartness"

  3. Semantic properties. You treat properties only as relatively attached to objects. You focus on types of changes (D): how properties and their interpretations change/get changed by some other thing Y. You "reduce" properties to D and Y. E.g. "anyone can be a genius or a fool under certain important conditions" or "everyone is smart, but in a unique and important way"

2. Commitment to experiences and knowledge

I think there are three levels of commitment to experiences:

  1. You're interested in particular experiences.

  2. You want to explore all possible experiences.

  3. You're interested in types of changes (D): how your experience changes/get changed by some other thing Y. D and Y need to be important even outside of experience.

So, on the 3rd level you care about interesting ways (D) in which experiences correspond to reality (Y).

3. Experience and morality

I think there are three levels of investigating the connection between experience and morality:

  1. You study how experience causes us to do good or bad things.
  2. You study all the different experiences "goodness" and "badness" causes in us.
  3. You study types of changes (D): how your experience changes/get changed by some other thing Y. D and Y need to be important even outside of experience. But related to morality anyway.

For example, Y can be "[basic] properties of concepts" and D can be "matches / mismatches [between concepts and actions towards them]". You can study how experience affects properties of concepts which in turn bias actions. An example of such analysis: "loving a sentient being feels fundamentally different from eating a sandwich. food taste is something short and intense, but love can be eternal and calm. this difference helps to not treat other sentient beings as something disposable"

I think the existence of the 3rd level isn't acknowledged much. Most versions of moral sentimentalism are 2nd level at best. Epistemic Sentimentalism can be 3rd level in the best case.


Exploring cognition

1. Patterns

I think there are three levels of [studying] patterns:

  1. You study particular patterns (X). You treat patterns as objective configurations in reality.
  2. You study all possible patterns. You treat patterns as subjective qualities of information, because most patterns are fake.
  3. You study types of changes (D): how patterns change/get changed by some other thing Y. D and Y need to be important even outside of (explicit) pattern analysis. You treat a pattern as a combination of the three components: "X + Y + D".

For example, Y can be "pieces of information" or "contexts": you can study how patterns get discarded or redefined (D) when new information gets revealed/new contexts get considered.

You can study patterns which are "objective", but exist only in a limited context. For example, think about your friend's bright personality (personality = a pattern). It's an "objective" pattern, and yet it exists only in a limited context: the pattern would dissolve if you compared your friend to all possible people. Or if you saw your friend in all possible situations they could end up in. Your friend's personality has some basis in reality (X), has a limited domain of existence (Y) and the potential for change (D).

2. Patterns and causality

I think there are three levels in the relationship between patterns and causality. I'm going to give examples about visual patterns:

  1. You learn which patterns are impossible due to local causal processes. For example: "I'm unlikely to see a big tower made of eggs standing on top of each other". It's just not a stable situation due to very familiar laws of physics.

  2. You learn statistical patterns (correlations) which can have almost nothing to do with causality. For example: "people like to wear grey shirts".

  3. You learn types of changes (D): how patterns change/get changed by some other thing Y. D and Y need to be important even outside of (explicit) pattern analysis. And related to causality.

Y can be "basic properties of images" and "basic properties of patterns"; D can be "sharing properties" and "keeping the complexity the same". In simpler words:

On the 3rd level you learn patterns which have strong connections to other patterns and basic properties of images. You could say such patterns are created/prevented by "global" causal processes. For example: "I'm unlikely to see a place fully filled with dogs. dogs are not people or birds or insects, they don't create such crowds or hordes". This is very abstract, connects to other patterns and basic properties of images.

Causality: implications for Machine Learning

I think...

  • It's likely that Machine Learning models don't learn 3rd level patterns as well as they could, as sharply as they could.
  • Machine Learning models should be 100% able to learn 3rd level patterns. It shouldn't require any specific data.
  • Learning/comparing level 3 patterns is interesting enough on its own. It could be its own area of research. But we don't apply statistics/Machine Learning to try mining those patterns. This may be a missed opportunity for humans.

3. Cognitive processes

Suppose you want to study different cognitive processes, skills, types of knowledge. There are three levels:

  1. You study particular cognitive processes.

  2. You study types (qualities) of cognitive processes. And types of types (classifications).

  3. You study types of changes (D): how cognitive processes change/get changed by some other thing Y. D and Y need to be important even without the context of cognitive processes.

For example, Y can be "fundamental configurations / fundamental objects" and D can be "finding a fundamental configuration/object in a given domain". You can "reduce" different cognitive process to those Y and D: (names of the processes below shouldn't be taken 100% literally)

^(1 "fundamental" means "VERY widespread in a certain domain")

  • Causal reasoning learns fundamental configurations of fundamental objects in the real world. So you can learn stuff like "this abstract rule applies to most objects in the world".
  • Symbolic reasoning learns fundamental configurations of fundamental objects in your "concept space". So you can learn stuff like ""concept A containing concept B" is an important pattern" (see set relations).
  • Correlational reasoning learns specific configurations of specific objects.
  • Mathematical reasoning learns specific configurations of fundamental objects. So you can build arbitrary structures with abstract building blocks.
  • Self-aware reasoning can transform fundamental objects into specific objects. So you can think thoughts like, for example, "maybe I'm just a random person with random opinions" (you consider your perspective as non-fundamental) or "maybe the reality is not what it seems".

I know, this looks "funny", but I think all this could be easily enough formalized. Isn't that a natural way to study types of reasoning? Just ask what knowledge a certain type of reasoning learns!


Exploring theories

1. Science

I think there are three ways of doing science:

  1. You predict a specific phenomenon.

  2. You study types of phenomena. (qualities of phenomena)

  3. You study types of changes (D): how the phenomenon changes/get changed by some other thing Y. D and Y need to be important even outside of this phenomenon.

Imagine you want to explain combustion (why/how things burn):

  1. You try to predict combustion. This doesn't work, because you already know "everything" about burning and there are many possible theories. You end up making things up because there's not enough new data.
  2. You try to compare combustion to other phenomena. You end up fantasizing about imaginary qualities of the phenomenon. At this level you get something like theories of "classical elements" (fantasies about superficial similarities).
  3. You find or postulate a new thing (Y) which affects/gets affected (D) by combustion. Y and D need to be important in many other phenomena. If Y is "types of matter" and D is "releasing / absorbing", this gives you Phlogiston theory. If Y is "any matter" and D is "conservation of mass" and "any transformations of matter", you get Lavoisier's theory. If Y is "small pieces of matter (atoms)" and D is "atoms hitting each other", you get Kinetic theory of gases.

So, I think phlogiston theory was a step in the right direction, but it failed because the choice of Y and D wasn't abstract enough.

I think most significant scientific breakthroughs require level 3 ideas. Partially "by definition": if a breakthrough is not "level 3", then it means it's contained in a (very) specific part of reality.

2. Math

I think there are three ways of doing math:

  1. You explore specific mathematical structures.

  2. You explore types of mathematical structures. And types of types. And typologies. At this level you may get something like Category theory.

  3. You study types of changes (D): how equations change/get changed by some other thing Y. D and Y need to be important even outside of (explicit) math.

Mathematico-philosophical insights

Let's look at math through the lens of the 3rd level:

All concepts above are "3rd level". But we can classify them, creating new three levels of exploration (yes, this is recursion!). Let's do this. I think there are three levels of mathematico-philosophical concepts:

  1. Concepts that change the properties of things we count. (e.g. topology, fractals, graph theory)
  2. Concepts that change the meaning of counting. (e.g. probability, computation, utility, sets, group theory, Gödel's incompleteness theorems and Tarski's undefinability theorem)
  3. Concepts that change the essence of counting. (e.g. Calculus, vectors, probability, actual infinity, fractal dimensions)

So, Calculus is really "the king of kings" and "the insight of insights". 3rd level of the 3rd level.

3. Physico-philosophical insights

I would classify physico-philosophical concepts as follows:

  1. Concepts that change the way movement affects itself. E.g. Net force, Wave mechanics, Huygens–Fresnel principle

  2. Concepts that change the "meaning" of movement. E.g. the idea of reference frames (principles of relativity), curved spacetime (General Relativity), the idea of "physical fields" (classical electromagnetism), conservation laws and symmetries, predictability of physical systems.

  3. Concepts that change the "essence" of movement, the way movement relates to basic logical categories. E.g. properties of physical laws and theories (Complementarity; AdS/CFT correspondence), the beginning/existence of movement (cosmogony, "why is there something rather than nothing?", Mathematical universe hypothesis), the relationship between movement and infinity (Supertasks) and computation/complexity, the way "possibility" spreads/gets created (Quantum mechanics, Anthropic principle), the way "relativity" gets created (Mach's principle), the absolute mismatch between perception and the true nature of reality (General Relativity, Quantum Mechanics), the nature of qualia and consciousness (Hard problem of consciousness), the possibility of Theory of everything and the question "how far can you take [ontological] reductionism?", the nature of causality and determinism, the existence of space and time and matter and their most basic properties, interpretation of physical theories (interpretations of quantum mechanics).


Exploring meta ideas

To define "meta ideas" we need to think about many pairs of "Y, D" simultaneously. This is the most speculative part of the post. Remember, you can treat those speculations simply as sci-fi ideas.

Each pair of abstract concepts (Y, D) defines a "language" for describing reality. And there's a meta-language which connects all those languages. Or rather there's many meta-languages. Each meta-language can be described by a pair of abstract concepts too (Y, D).

^(Instead of "languages" I could use the word "models". But I wanted to highlight that those "models" don't have to be formal in any way.)

I think the idea of "meta-languages" can be used to analyze:

  • Consciousness. You can say that consciousness is "made of" multiple abstract interacting languages. On one hand it's just a trivial description of consciousness, on another hand it might have deeper implications.
  • Qualia. You can say that qualia is "made of" multiple abstract interacting languages. On one hand this is a trivial idea ("qualia is the sum of your associations"), on another hand this formulation adds important specific details.
  • The ontology of reality. You can argue that our ways to describe reality ("physical things" vs. purely mathematical concepts, subjective experience vs. physical world, high-level patterns vs. complete reductionism, physical theory vs. philosophical ontology) all conflict with each other and lead to paradoxes when taken to the extreme, but can't exist without each other. Maybe they are all intertwined?
  • Meta-ethics. You can argue that concepts like "goodness" and "justice" can't be reduced to any single type of definition. So, you can try to reduce them to a synthesis of many abstract languages. See G. E. Moore ideas about indefinability: the naturalistic fallacy, the open-question argument.

According to the framework, ideas about "meta-languages" define the limit of conceivable ideas.

If you think about it, it's actually a quite trivial statement: "meta-models" (consisting of many normal models) is the limit of conceivable models. Your entire conscious mind is such "meta-model". If no model works for describing something, then a "meta-model" is your last resort. On one hand "meta-models" is a very trivial idea^(1), on another hand nobody ever cared to explore the full potential of the idea.

^(1 for example, we have a "meta-model" of physics: a combination of two wrong theories, General Relativity and Quantum Mechanics.)

Nature of percepts

I talked about qualia in general. Now I just want to throw out my idea about the nature of particular percepts.

There are theories and concepts which link percepts to "possible actions" and "intentions": see Affordance. I like such ideas, because I like to think about types of actions.

So I have a variation of this idea: I think that any percept gets created by an abstract dynamic (Y, D) or many abstract dynamics. Any (important) percept corresponds to a unique dynamic. I think abstract dynamics bind concepts.

^(But I have only started to think about this. I share it anyway because I think it follows from all the other ideas.)

P.S.

Thank you for reading this.

If you want to discuss the idea, please focus on the idea itself and its particular applications. Or on exploring particular topics!