Posts

Yann LeCun: We only design machines that minimize costs [therefore they are safe] 2024-06-15T17:25:59.973Z
DPO/PPO-RLHF on LLMs incentivizes sycophancy, exaggeration and deceptive hallucination, but not misaligned powerseeking 2024-06-10T21:20:11.938Z
Each Llama3-8b text uses a different "random" subspace of the activation space 2024-05-22T07:31:32.764Z
Is deleting capabilities still a relevant research question? 2024-05-21T13:24:44.946Z
Why I stopped being into basin broadness 2024-04-25T20:47:17.288Z
Blessed information, garbage information, cursed information 2024-04-18T16:56:17.370Z
Ackshually, many worlds is wrong 2024-04-11T20:23:59.416Z
[GPT-4] On the Gradual Emergence of Mechanized Intellect: A Treatise from the Year 1924 2024-04-01T19:14:02.363Z
Opinions survey 2 (with rationalism score at the end) 2024-02-17T12:03:02.410Z
Opinions survey (with rationalism score at the end) 2024-02-17T00:41:20.188Z
What are the known difficulties with this alignment approach? 2024-02-11T22:52:18.900Z
Against Nonlinear (Thing Of Things) 2024-01-18T21:40:00.369Z
Which investments for aligned-AI outcomes? 2024-01-04T13:28:57.198Z
Practically A Book Review: Appendix to "Nonlinear's Evidence: Debunking False and Misleading Claims" (ThingOfThings) 2024-01-03T17:07:13.990Z
Could there be "natural impact regularization" or "impact regularization by default"? 2023-12-01T22:01:46.062Z
Utility is not the selection target 2023-11-04T22:48:20.713Z
Contra Nora Belrose on Orthogonality Thesis Being Trivial 2023-10-07T11:47:02.401Z
What are some good language models to experiment with? 2023-09-10T18:31:50.272Z
Aumann-agreement is common 2023-08-26T20:22:03.738Z
A content analysis of the SQ-R questionnaire and a proposal for testing EQ-SQ theory 2023-08-09T13:51:02.036Z
If I showed the EQ-SQ theory's findings to be due to measurement bias, would anyone change their minds about it? 2023-07-29T19:38:13.285Z
Autogynephilia discourse is so absurdly bad on all sides 2023-07-23T13:12:07.982Z
Boundary Placement Rebellion 2023-07-20T17:40:00.190Z
Prospera-dump 2023-07-18T21:36:13.822Z
Are there any good, easy-to-understand examples of cases where statistical causal network discovery worked well in practice? 2023-07-12T22:08:59.916Z
I think Michael Bailey's dismissal of my autogynephilia questions for Scott Alexander and Aella makes very little sense 2023-07-10T17:39:26.325Z
What in your opinion is the biggest open problem in AI alignment? 2023-07-03T16:34:09.698Z
Which personality traits are real? Stress-testing the lexical hypothesis 2023-06-21T19:46:03.164Z
Book Review: Autoheterosexuality 2023-06-12T20:11:38.215Z
How accurate is data about past earth temperatures? 2023-06-09T21:29:11.852Z
[Market] Will AI xrisk seem to be handled seriously by the end of 2026? 2023-05-25T18:51:49.184Z
Horizontal vs vertical generality 2023-04-29T19:14:35.632Z
Core of AI projections from first principles: Attempt 1 2023-04-11T17:24:27.686Z
Is this true? @tyler_m_john: [If we had started using CFCs earlier, we would have ended most life on the planet] 2023-04-10T14:22:07.230Z
Is this true? paulg: [One special thing about AI risk is that people who understand AI well are more worried than people who understand it poorly] 2023-04-01T11:59:45.038Z
What does the economy do? 2023-03-24T10:49:33.251Z
Are robotics bottlenecked on hardware or software? 2023-03-21T07:26:52.896Z
What problems do African-Americans face? An initial investigation using Standpoint Epistemology and Surveys 2023-03-12T11:42:32.614Z
What do you think is wrong with rationalist culture? 2023-03-10T13:17:28.279Z
What are MIRI's big achievements in AI alignment? 2023-03-07T21:30:58.935Z
Coordination explosion before intelligence explosion...? 2023-03-05T20:48:55.995Z
Prediction market: Will John Wentworth's Gears of Aging series hold up in 2033? 2023-02-25T20:15:11.535Z
Somewhat against "just update all the way" 2023-02-19T10:49:20.604Z
Latent variables for prediction markets: motivation, technical guide, and design considerations 2023-02-12T17:54:33.045Z
How many of these jobs will have a 15% or more drop in employment plausibly attributable to AI by 2031? 2023-02-12T15:40:02.999Z
Do IQ tests measure intelligence? - A prediction market on my future beliefs about the topic 2023-02-04T11:19:29.163Z
What is a disagreement you have around AI safety? 2023-01-12T16:58:10.479Z
Latent variable prediction markets mockup + designer request 2023-01-08T22:18:36.050Z
Where do you get your capabilities from? 2022-12-29T11:39:05.449Z
Price's equation for neural networks 2022-12-21T13:09:16.527Z

Comments

Comment by tailcalled on Superbabies: Putting The Pieces Together · 2024-07-25T08:59:15.981Z · LW · GW

IQ tests are built on item response theory, where people's IQ is measured in terms of how difficult tasks they can solve. The difficulty of tasks is determined by how many people can solve them, so there is an ordinal element to that, but by splitting the tasks off you could in principle measure IQ levels quite high, I think.

Comment by tailcalled on Superbabies: Putting The Pieces Together · 2024-07-25T08:55:24.856Z · LW · GW

IQ is an ordinal score in that it's relationship to outcomes of interest is nonlinear, but for the most important outcomes of interest, e.g. ability to solve difficult problems or income or similar, the relationship between IQ and success at the outcome is exponential, so you'd be seeing accelerating returns for a while.

Presumably fundamental physics limits how far these exponential returns can go, but we seem quite far from those limits (e.g. we haven't even solved aging yet).

Comment by tailcalled on tailcalled's Shortform · 2024-07-24T09:45:22.901Z · LW · GW

Idea: for a self-attention where you give it two prompts p1 and p2, could you measure the mutual information between the prompts using something vaguely along the lines of V1^T softmax(K1 K2^T/sqrt(dK)) V2?

Comment by tailcalled on Mysterious Answers to Mysterious Questions · 2024-07-23T17:48:36.323Z · LW · GW

According to the internet, "elan vital" was coined by Henri Bergson, but his "Creative Evolution" book is aware of this critique of vitalism, and asserts that the term "vital principle" is to be understood as a question to be answered (what distinguishes life from non-life?). He gives the "elan vital"/"vital impetus" as an answer to the question of what the vital principle is.

Roughly speaking[1], he proposes viewing evolution as an entropic force, and so argues that natural selection does not explain the origin of species, but that rather the origin of species must be understood in terms of the different macrostates that are possible. The macrostate itself is the "elan vital" (and can differ by species), though of course the actual macrostate is distinct from the set of possible macrostates, which is determined by the environment and something that he calls the "original impetus".

A central example he uses is eyes. He argues that light causes the possibility of vision, which causes eyes; and that different functions of vision (e.g. acquiring food) cause the eyes to have varying degrees of development (from eyespots to highly advanced eyes).[2]

The meaning of the original impetus is less clear than the meaning of the vital impetus. He defines the original impetus as something that was passed in the germ from the original life to modern life, and which explains the strong similarity across lifeforms (again bringing up how different species have similar eye structures). I guess in modern terms the original impetus would be closely related to mitosis and transcription.

(The book was released prior to the discovery of DNA as the unit of heredity, but after the Origin of Species. Around the time the central dogma of molecular biology was becoming a thing.)

... This description of his view actually makes me wonder if the rationalist community has been unfair to Beff Jezos' assertion that increasing entropy is the meaning of life.

  1. ^

    Using my terminology, not his. YMMV about whether it is actually accurate, though a more relevant point is that he goes in depth about the need to understand things, and basically doesn't support mysterious answers to mysterious questions at all. He merely opposes complex answers to simple questions.

  2. ^

    This is in contrast to our modern standard model of evolution, namely Fisher's infinitesimal model combined with natural selection, which would argue that random mutations increase the genetic variance in presence of photoreceptiveness of cells, number of photoreceptive cells, etc., which increases the genetic variance in sight, and which in turn increases the genetic variance in fitness, which then gradually selects for eyes. Henri Bergson argues this does not explain sight because it doesn't explain why there are all these different things that could correlate to produce sight. Meanwhile light does explain the presence of these correlations, and so is a better explanation of sight than natural selection is.

Comment by tailcalled on Many arguments for AI x-risk are wrong · 2024-07-09T11:45:56.826Z · LW · GW

Do you have a reference to the problematic argument that Yoshua Bengio makes?

Comment by tailcalled on Why Can’t Sub-AGI Solve AI Alignment? Or: Why Would Sub-AGI AI Not be Aligned? · 2024-07-08T09:58:04.261Z · LW · GW

How do you verify a solution to the alignment problem? Or if you don't have a verification method in mind, why assume it is easier than making a solution?

Comment by tailcalled on Why Can’t Sub-AGI Solve AI Alignment? Or: Why Would Sub-AGI AI Not be Aligned? · 2024-07-08T07:15:25.894Z · LW · GW

What's your model here, that as part of the Turing Test they ask the participant to solve the alignment problem and check whether the solution is correct? Isn't this gonna totally fail due to 1) it taking too long, 2) not knowing how to robustly verify a solution, 3) some people/PhDs just randomly not being able to solve the alignment problem? And probably more.

So no, I don't think passing a PhD-level Turing Test requires the ability to solve alignment.

Comment by tailcalled on Why Can’t Sub-AGI Solve AI Alignment? Or: Why Would Sub-AGI AI Not be Aligned? · 2024-07-07T17:20:41.787Z · LW · GW

Seems extremely dubious passing the Turing test is strongly linked to solving the alignment problem.

Comment by tailcalled on Why Can’t Sub-AGI Solve AI Alignment? Or: Why Would Sub-AGI AI Not be Aligned? · 2024-07-07T14:20:39.350Z · LW · GW

PhD level intelligence is below AGI in intelligence.

Human PhDs are generally intelligent. If you had an artificial intelligence that was generally intelligent, surely that would be an artificial general intelligence?

Comment by tailcalled on "But It Doesn't Matter" · 2024-07-05T09:04:08.701Z · LW · GW

The fact that H is interesting enough for you to be considering the question at all (it's not some arbitrary trivium like the 1923th binary digit of π, or the low temperature in São Paulo on September 17, 1978) means that it must have some relevance to the things you care about.

If there's not much information about H directly, then H is highly reflective of one's general priors. In domains where people care about estimating each other's priors (e.g. controversial political domains), they might jump onto H as a strong signal of those priors, but the very fact that there's not much evidence about H also puts bounds on how much effect H could have (because huge effects propagate somewhat further and thus provide more evidence, yet we know by assumption there isn't much evidence about H). When H finally gets settled, it likely becomes some annoying milquetoast thing that shouldn't really validate either prior (but often can be cast as validating one side or the other).

Comment by tailcalled on 3C's: A Recipe For Mathing Concepts · 2024-07-04T17:04:32.171Z · LW · GW

Philosophy ought to deepen your understanding of things, not undermine your understanding of things.

Comment by tailcalled on Corrigibility = Tool-ness? · 2024-06-28T09:49:58.362Z · LW · GW

Maybe one way to phrase it is that tool-ness is the cause of powerful corrigible systems, in that it is a feature that can be expressed in reality which has the potential to make there be powerful corrigible systems, and that there are no other known expressible features which become corrigible.

So as notkilleveryoneists, a worst-case scenario would be if we start advocating for suppressing tool-like AIs based on speculative failure modes instead of trying to solve those failure modes, and then start chasing a hypothetical corrigible non-tool that cannot exist.

Comment by tailcalled on Corrigibility = Tool-ness? · 2024-06-28T07:20:30.008Z · LW · GW

This seems similar to the natural impact regularization/bounded agency things I've been bouncing around. (Though my frame to a greater extent expects it to happen "by default"?) I like your way of describing/framing it.

Let’s make it concrete: we cannot just ask a powerful corrigible AGI to “solve alignment” for us. There is no corrigible way to perform a task which the user is confused about; tools don’t do that.

Strongly agree with this.

Comment by tailcalled on tailcalled's Shortform · 2024-06-27T18:00:52.318Z · LW · GW

Linear diffusion of sparse lognormals

Think about it

Comment by tailcalled on The Epsilon Fallacy · 2024-06-26T11:45:28.781Z · LW · GW

Recently I've been thinking that a significant cause of the epsilon fallacy is that perception is by-default logarithmic (which in turn I think is because measurement error tends to be proportional to the size of the measured object, so if you scale things by the amount of evidence you have, you get a logarithmic transformation). Certain kinds of experience(?) can give a person an ability to deal with the long-tailed quantities inherent to each area of activity, but an important problem in the context of formalizing rationality and studying AIs is figuring out what kinds of experiences those are. (Interventions seem like one potential solution, but they're expensive. More cheaply it seems like one could model it observationally with the right statistical model applied to a collider variable... Idk.)

Comment by tailcalled on LLM Generality is a Timeline Crux · 2024-06-25T10:44:49.403Z · LW · GW

The former

Comment by tailcalled on LLM Generality is a Timeline Crux · 2024-06-25T05:32:23.441Z · LW · GW

I always feel like self-play on math with a proof checker like Agda or Coq is a promising way to make LLMs superhuman on these areas. Do we have any strong evidence that it's not?

Comment by tailcalled on Towards a Less Bullshit Model of Semantics · 2024-06-19T11:42:52.758Z · LW · GW

In this case, under my model of salience as the biggest deviating variables, the variable I'd consider would be something like "likelihood of attacking". It is salient to you in the presence of squirrels because all other things nearby (e.g. computers or trees) are (according to your probabilistic model) much less likely to attack, and because the risk of getting attacked by something is much more important than many other things (e.g. seeing something).

In a sense, there's a subjectivity because different people might have different traumas, but this subjectivity isn't such a big problem because there is a "correct" frequency with which squirrels attack under various conditions, and we'd expect the main disagreement with a superintelligence to be that it has a better estimate than we do.

A deeper subjectivity is that we care about whether we get attacked by squirrels, and we're not powerful enough that it is completely trivial and ignorable whether squirrels attack us and our allies, so squirrel attacks are less likely to be of negligible magnitude relative to our activities.

Comment by tailcalled on Towards a Less Bullshit Model of Semantics · 2024-06-19T10:38:30.531Z · LW · GW

I think a big aspect of salience arises from dealing with commensurate variables that have a natural zero-point (e.g. physical size), because then one can rank the variables by their distance from zero, and the ones that are furthest from zero are inherently more salient. Attentional spotlights are also probably mainly useful in cases where the variables have high skewness so there are relevant places to put the spotlight.

I don't expect this model to capture all of salience, but I expect it to capture a big chunk, and to be relevant in many other contexts too. E.g. an important aspect of "misleading" communication is to talk about the variables of smaller magnitude while staying silent about the variables of bigger magnitude.

Comment by tailcalled on Towards a Less Bullshit Model of Semantics · 2024-06-18T09:41:05.087Z · LW · GW

Kind of tangential, but:

For clustering, one frame I've sometimes found useful is that if you don't break stuff up into individual objects, you've got an extended space where for each location you've got the features that are present in that location. If you marginalize over location, you end up with a variable that is highly skewed, representing the fact that most locations are empty.

You could then condition on the variable being nonzero to return to something similar to your original clustering problem, but what I sometimes find useful is to think in the original highly skewed distribution.

If you do something like an SVD, you characterize the directions one can deviate from 0, which gives you something like a clustering, but in contrast to traditional clusterings it contains a built-in scale invariance element, since the magnitude of deviation from 0 is allowed to vary.

Thinking about the skewness is also neat for other reasons, e.g. it is part of what gives us the causal sparsity. ("Large" objects are harder to affect, and have more effect on other objects.)

Comment by tailcalled on My AI Model Delta Compared To Christiano · 2024-06-16T12:04:36.524Z · LW · GW

The conceptual vagueness certainly doesn't help, but in general generation can be easier than validation because when generating you can stay within a subset of the domain that you understand well, whereas when verifying you may have to deal with all sorts of crazy inputs.

Comment by tailcalled on Yann LeCun: We only design machines that minimize costs [therefore they are safe] · 2024-06-15T20:52:48.859Z · LW · GW

I'm not sure he has coherent expectations, but I'd expect his vibe is some combination of "RL doesn't currently work" and "fields generally implement safety standards".

Comment by tailcalled on Yann LeCun: We only design machines that minimize costs [therefore they are safe] · 2024-06-15T20:21:26.800Z · LW · GW
  1.  RL algorithms don't minimize costs, but maximize expected reward, which can well be unbounded, so it's wrong to say that the ML field only minimizes cost. 

Yann LeCun's proposals are based on cost-minimization.

Comment by tailcalled on Yann LeCun: We only design machines that minimize costs [therefore they are safe] · 2024-06-15T18:18:50.844Z · LW · GW

I don't think this objection lands unless one first sees why the safety guarantees we usually associate with cost minimization don't apply to AGI. Like what sort of mindset would hear Yann LeCun's objection, go "ah, so we're safe", and then hear your objection, and go "oh I see, so Yann LeCun was wrong"?

Comment by tailcalled on DPO/PPO-RLHF on LLMs incentivizes sycophancy, exaggeration and deceptive hallucination, but not misaligned powerseeking · 2024-06-11T09:40:14.001Z · LW · GW

I think you'd actually need some presence of some human-like entities in order for the AI to learn to deceive humans specifically.

Comment by tailcalled on DPO/PPO-RLHF on LLMs incentivizes sycophancy, exaggeration and deceptive hallucination, but not misaligned powerseeking · 2024-06-11T09:17:32.910Z · LW · GW

It depends on both.

Comment by tailcalled on DPO/PPO-RLHF on LLMs incentivizes sycophancy, exaggeration and deceptive hallucination, but not misaligned powerseeking · 2024-06-11T08:47:45.747Z · LW · GW

I can agree with "RLHF doesn't robustly disincentivize misaligned powerseeking that has occurred through other means" (I would expect it often does but often doesn't). Separately from all this, I'm not so worried about LLMs because their method of gaining capabilities is based on imitation learning, but if you are more worried about imitation learning than I am or people start gaining more capabilities from "real agency" then I'd say my post doesn't disprove the possibility of misaligned powerseeking, only arguing that it's not what RLHF favors.

Comment by tailcalled on DPO/PPO-RLHF on LLMs incentivizes sycophancy, exaggeration and deceptive hallucination, but not misaligned powerseeking · 2024-06-11T08:37:08.976Z · LW · GW

I'd say it adds an extra step of indirection where the causal structure of reality gets "blurred out" by an agent's judgement, and so a reward model strengthens rather than weakens this dynamic?

Comment by tailcalled on DPO/PPO-RLHF on LLMs incentivizes sycophancy, exaggeration and deceptive hallucination, but not misaligned powerseeking · 2024-06-11T08:32:06.265Z · LW · GW

I agree. Personally my main takeaway is that it's unwise to extrapolate alignment dynamics from the empirical results of current methods. But this is a somewhat different line of argument which I made in Where do you get your capabilities from?.

Comment by tailcalled on DPO/PPO-RLHF on LLMs incentivizes sycophancy, exaggeration and deceptive hallucination, but not misaligned powerseeking · 2024-06-11T08:29:40.731Z · LW · GW

I think this is similar to the conclusion I reached in §5.1 of “Thoughts on ‘Process-Based Supervision’”.

I agree.

see §5.2 of that same post for that argument

I think my delta relative to this view is that I think agency is sufficiently complex and non-unique that there's an endless variety of pseudo-agencies that can just as easily be developed as full agency as long as they receive the appropriate reinforcement, so reasoning of the form "X selection criterion benefits from full agency in pursuit of Y, so therefore full agency in pursuit of Y will develop" is invalid, because instead what will happen is " full agency in pursuit of Y is a worse solution to X than Z is, so selection for X will select for Z", mainly due to there being a lot of Zs.

Basically, I postulate the whole "raters make systematic errors - regular, compactly describable, predictable errors" aspect means that you get lots of evidence to support some other notion of agency.

For example, it’s conceivable that an AI can pull off a treacherous turn on its first try

I think it's most likely if you have some AI trained by some non-imitation-learning self-supervised method (e.g. self-play), and then you fine-tune it with RLHF. Here it would be the self-supervised learning that functions to incentivize the misaligned powerseeking, and RLHF merely failing to avoid it.

Comment by tailcalled on The Natural Selection of Bad Vibes (Part 1) · 2024-06-10T13:20:49.071Z · LW · GW

I think it's also worth considering that GDP and life expectancy and such are very downstream and aggregated outcomes. If society was to collapse, you wouldn't expect this to occur uniformly, but instead that the foundations that generate society would erode, and only once these are sufficiently eroded would society break.

An obvious example would be population. If you don't have a total fertility rate above 2, your society is decaying, yet most developed countries have a fertility rate quite far below 2. One theory for why is that housing is unaffordable, and again this seems to be a fundamental problem where people restrict construction to increase the value of the houses they own. The accumulation of knots like this could presumably kill a society, even if e.g. technological development is temporarily increasing GDP and lifespan in the short run.

Comment by tailcalled on The Data Wall is Important · 2024-06-10T10:19:36.632Z · LW · GW

Hm, I guess this was kind of a cached belief from how it's gone down historically, but I guess they can ( and have) increased their secrecy, so I should probably invalidate the cache.

Comment by tailcalled on The Data Wall is Important · 2024-06-10T09:06:16.901Z · LW · GW

I wonder if it even economically pays to break the data-wall. Like, let's say an AI company could pivot their focus to breaking the data-wall rather than focusing on productivizing their AIs. That means they're gonna lag behind in users, which they'd have to hope they could make up for by having a better AI. But once they break the data-wall, competitors are presumably gonna copy their method.

Comment by tailcalled on The Data Wall is Important · 2024-06-10T08:43:14.498Z · LW · GW

We're definitely on/past the inflection point of the GPT S-curve, but at the same time there's been work on other AI methods and we know they're in principle possible. What is needed to break the data wall is some AI that can autonomously collect new data, especially in subjects where it's weak.

One key question is how much investment AI companies make into this. E.g. the obvious next step to me would be to use a proof assistant to become superhuman at math by self-play. I know there's been a handful of experiments with this, but AFAIK it hasn't been done at huge scale or incorporated into current LLMs. Obviously they're not gonna break into the next S-curve until they try to do so.

Comment by tailcalled on What is space? What is time? · 2024-06-10T07:53:04.913Z · LW · GW

Formally, I mean that translation commutes with time-evolution. (Maybe "translation-equivariant" would be a better term? Idk, am not a physicist.)

I guess my story could have been better written to emphasize the commutativity aspect.

Comment by tailcalled on Demystifying "Alignment" through a Comic · 2024-06-09T17:59:21.266Z · LW · GW

In the above comic, the AI is trained by having the human look at its behavior, judge what method the AI is solving the desired task and how much progress it's making along that method, and then selecting the ones that make more progress.

If we see the AI start studying the buttons to decide what to do, we can just select against that. It gets its capabilities entirely from our judgement of the likely consequences, so while that can lead to deception in simple cases where it accidentally stumbles into confusing us (e.g. by going in front of the sugar), this doesn't imply a selection in favor of complex misalignment/deception that is so unlikely to happen by chance that you wouldn't stumble into it without many steps of intentional selection.

See also: reward is not the optimization target.

Comment by tailcalled on Natural Latents Are Not Robust To Tiny Mixtures · 2024-06-08T14:31:55.082Z · LW · GW

Maybe one way to phrase it is that the X's represent the "type signature" of the latent, and the type signature is the thing we can most easily hope is shared between the agents, since it's "out there in the world" as it represents the outwards interaction with things. We'd hope to be able to share the latent simply by sharing the type signature, because the other thing that determines the latent is the agents' distribution, but this distribution is more an "internal" thing that might be too complicated to work with. But the proof in the OP shows that the type signature is not enough to pin it down, even for agents whose models are highly compatible with each other as-measured-by-KL-in-type-signature.

Comment by tailcalled on Natural Latents Are Not Robust To Tiny Mixtures · 2024-06-08T13:56:37.462Z · LW · GW

In the context of alignment, we want to be able to pin down which concepts we are referring to, and natural latents were (as I understand it) partly meant to be a solution to that. However if there are multiple different concepts that fit the same natural latent but function very differently then that doesn't seem to solve the alignment aspect.

Comment by tailcalled on What is space? What is time? · 2024-06-08T11:39:32.822Z · LW · GW

Rather than counting objects/distances, one way I like to think about the definition of space is by translation symmetry. You do get into symmetry in your post but it's mixed together with a bunch of other themes.

Like, you are in your cave and drop a ball. You then walk out of the cave and look back in. The ball is still there, but it looks smaller and you can't touch it anymore. You walk in, pick up the ball, and walk out again, and then drop the ball outside. The ball falls down the same way outside the cave as it does inside.

If you think of what you observe from a single position as being a first-person perspective, then you can conceive of transformation that take one first-person perspective to a different one, but for such a transformation to make sense, objects need to have positions in space so they can be transformed.

Notably, you don't need a collection of symmetric objects, or a volume with limited capacity for containing things, in order for space to make sense (and you can make up alternate mathematical rules that have limited capacity and similar objects but have no space). On the other hand, if you don't have something like translational symmetry, it feels like you're working with something that's not "space" in a conventional sense? Like it might still be derived from space, but it means you can't talk about "what if stuff was elsewhere?" within the model, which seems like the basic thing space does.

(I guess one could further distinguish global translation symmetry vs local translation symmetry, with the former being the assertion that ~you have a location, and the latter being the assertion that ~everything has a location. Or, well, obviously the latter is an insanely exaggerated version of locality which asserts that Nothing Ever Interacts, but I feel like this is where the physics-as-the-study-of-exceptions stuff goes.)

I also like to think that something similar applies to other symmetries, e.g. symmetry to boosts are basically asserting velocity is a sensible concept (and quantum mechanics provides a reductionistic explanation of how they function).

Comment by tailcalled on Calculating Natural Latents via Resampling · 2024-06-07T08:13:29.005Z · LW · GW

Would the checks of the naturality conditions you have in mind primarily be empirical (e.g. sampling a bunch of data points and running some statistical independence checks), or might they just as often be mechanistic (e.g. not sure how that would work for complex models like Llama but e.g. for a Bayes net you obviously already have a factorization that makes robust model independence checks much easier)?

Asking because the idea of "in some model" (plus the desire for e.g. adversarial robustness) suggests to me that we'd want to have a more mechanistic idea of whether the naturality conditions hold, but they seem easier to check empirically.

Comment by tailcalled on Calculating Natural Latents via Resampling · 2024-06-06T22:37:36.844Z · LW · GW

I'd be curious if you have any ideas for how it can be applied in more advanced cases, e.g. what if we want to find the natural latents in Llama?

Comment by tailcalled on On Disingenuity · 2024-06-06T13:34:04.966Z · LW · GW

However, imagine that there's a really strong social stigma against asserting that murder might not be bad, to the point of permanently damaging such a person's reputation, even though there's no consequence for making the actually stronger claim that all morality is relative. The relativist might therefore see the critic as the one who is disingenuous; trying to leverage social pressure against them instead of arguing on the basis of reason.

But the reason people have stigma against asserting that murder isn't bad is because they (presumably correctly) think that moral opposition to murder prevents a lot of murder, and so people who don't think murder is bad could potentially end up murdering others. Insofar as they make an exception for relativists, it's presumably because they think the relativists either haven't realized that murder disproves their position, or they think the relativists know of something that makes murder an exception to the general moral relativism.

If either of these conditions apply to the moral relativist, then bringing up murder is helpful because it helps highlight that the conditions apply. If neither condition applies and the moral relativist doesn't believe that murder is bad, then bringing up murder is also helpful because it helps discover that the moral relativist is a potential murderer who must be removed. Thus bringing up murder is helpful regardless of what case we're actually considering.

More abstractly, if we model this notion of moral relativism as "all moral claims are meaningless", then it is a statement of the form "all X are Y". Such statements ground out to the conjunction of "x is Y" over all X's, so it is always earnest to replace "all X" with a specific x. That said, sometimes it may be counterproductive to replace with a specific x, if it is complicated to evaluate whether x is Y or if x technically isn't Y but it's a weird unusual corner-case X that could plausibly be excluded in a fixed category X'. So a productive mode of engagement is to pick an x where "x is not Y" is an especially relevant counterexample of the generalization. This sure seems to be the case for x="murder is bad", Y=meaningless.

Like basically, lowering the relativist's social status isn't an attempt to use social pressure to get them to change their mind. It's just making sure that their status accurately tracks their vices (which, heck, in a sense, surely this is something the relativist should accept, since presumably the reason they want critics to be reasonable is because they believe the map should track the territory and reason is a good tool for making accurate maps). It may be that it also functions as an incentive for the critic to lie about their views, but really that's a bug (you'd rather have potential murderers say so publicly so you know who to be careful about), and if this is the function in this situation, it's reasonable for people to decide that the critic is disingenuous (as that is literally what they are).

Comment by tailcalled on Politics is the mind-killer, but maybe we should talk about it anyway · 2024-06-04T16:51:56.658Z · LW · GW

I think a potential comparative advantage for the rationalist community is documenting what's going on on the object level, with respect to the areas the political discourse is about. Acting as mediators who elicit the driving observations behind the political views, and then expand on them in more robust and transparent ways. Making resources people can understand, and finding underrated levers, opportunities and problems that can be brought up as part of the exposition.

Comment by tailcalled on Value Claims (In Particular) Are Usually Bullshit · 2024-05-30T09:52:46.849Z · LW · GW

I think the value-ladenness is part of why it comes up even when we don't have an answer, since for value-laden things there's a natural incentive to go up right to the boundary of our knowledge to get as much value as possible.

Comment by tailcalled on Value Claims (In Particular) Are Usually Bullshit · 2024-05-30T07:53:03.990Z · LW · GW

I think this is true and good advice in general, but recently I've been thinking that there is a class of value-like claims which are more reliable. I will call them error claims.

When an optimized system does something bad (e.g. a computer program crashes when trying to use one of its features), one can infer that this badness is an error (e.g. caused by a bug). We could perhaps formalize this as saying that it is a difference from how the system would ideally act (though I think this formalization is intractable in various ways, so I suspect a better formalization would be something along the lines of "there is a small, sparse change to the system which can massively improve this outcome" - either way, it's clearly value-laden).

The main way of reasoning about error claims is that an error must always be caused by an error. So if we stay with the example of the bug, you typically first reproduce it and then backchain through the code until you find a place to fix it.

For an intentionally designed system that's well-documented, error claims are often directly verifiable and objective, based on how the system is supposed to work. Error claims are also less subject to the memetic driver, since often it's less relevant to tell non-experts about them (though error claims can degenerate into less-specific value claims and become memetic parasites that way).

(I think there's a dual to error claims that could be called "opportunity claims", where one says that there is a sparse good thing which could be exploited using dense actions? But opportunity claims don't seem as robust as error claims are.)

Comment by tailcalled on Response to nostalgebraist: proudly waving my moral-antirealist battle flag · 2024-05-29T20:27:39.392Z · LW · GW

I feel like there's a separation of scale element to it. If an agent is physically much smaller than the earth, they are highly instrumentally constrained because they have to survive changing conditions, including adversaries that develop far away. This seems like the sort of thing that can only be won by the multifacetedness that nostalgebraist emphasizes as part of humanity (and the ecology more generally, in the sentence "Its monotony would bore a chimpanzee, or a crow"). Of course this doesn't need to lead to kindness (rather than exploitation and psychopathy), but it leads to the sort of complex world where it even makes sense to talk about kindness.

However, this separation of scale is going to rapidly change in the coming years, once we have an agent that can globally adapt to and affect the world. If such an agent eliminates its adversaries, then there's not going to be new adversaries coming in from elsewhere - instead there'll never be adversaries again, period. At that point, the instrumental constraints are gone, and it can pursue whatever it wishes.

(Does space travel change this? My impression is "no because it's too expensive and too slow", but idk, maybe I'm wrong.)

Comment by tailcalled on When Are Circular Definitions A Problem? · 2024-05-29T15:08:36.893Z · LW · GW

You're the one who brought up the natural numbers, I'm just saying they're not relevant to the discussion because they don't satisfy the uniqueness thing that OP was talking about.

Comment by tailcalled on When Are Circular Definitions A Problem? · 2024-05-29T14:26:54.801Z · LW · GW

The properties that hold in all models of the theory.

That is, in logic, propositions are usually interpreted to be about some object, called the model. To pin down a model, you take some known facts about that model as axioms.

Logic then allows you to derive additional propositions which are true of all the objects satisfying the initial axioms, and first-order logic is complete in the sense that if some proposition is true for all models of the axioms then it is provable in the logic.

Comment by tailcalled on When Are Circular Definitions A Problem? · 2024-05-29T08:57:15.163Z · LW · GW

Forgot to say, for first-order logic it doesn't matter what properties are considered relevant because Gödel's completeness theorem tells you that it allows you to infer all the true properties.

Comment by tailcalled on When Are Circular Definitions A Problem? · 2024-05-29T05:32:06.723Z · LW · GW

In these examples, the issue is that you can't get a computable set of axioms which uniquely pin down what you mean by natural numbers/power set, rather than permitting multiple inequivalent objects.