Posts

TsviBT's Shortform 2024-06-16T23:22:54.134Z
Koan: divining alien datastructures from RAM activations 2024-04-05T18:04:57.280Z
What could a policy banning AGI look like? 2024-03-13T14:19:07.783Z
A hermeneutic net for agency 2024-01-01T08:06:30.289Z
What is wisdom? 2023-11-14T02:13:49.681Z
Human wanting 2023-10-24T01:05:39.374Z
Hints about where values come from 2023-10-18T00:07:58.051Z
Time is homogeneous sequentially-composable determination 2023-10-08T14:58:15.913Z
Telopheme, telophore, and telotect 2023-09-17T16:24:03.365Z
Sum-threshold attacks 2023-09-08T17:13:37.044Z
Fundamental question: What determines a mind's effects? 2023-09-03T17:15:41.814Z
Views on when AGI comes and on strategy to reduce existential risk 2023-07-08T09:00:19.735Z
The fraught voyage of aligned novelty 2023-06-26T19:10:42.195Z
Provisionality 2023-06-19T11:49:06.680Z
Explicitness 2023-06-12T15:05:04.962Z
Wildfire of strategicness 2023-06-05T13:59:17.316Z
The possible shared Craft of deliberate Lexicogenesis 2023-05-20T05:56:41.829Z
A strong mind continues its trajectory of creativity 2023-05-14T17:24:00.337Z
Better debates 2023-05-10T19:34:29.148Z
An anthropomorphic AI dilemma 2023-05-07T12:44:48.449Z
The voyage of novelty 2023-04-30T12:52:16.817Z
Endo-, Dia-, Para-, and Ecto-systemic novelty 2023-04-23T12:25:12.782Z
Possibilizing vs. actualizing 2023-04-16T15:55:40.330Z
Expanding the domain of discourse reveals structure already there but hidden 2023-04-09T13:36:28.566Z
Ultimate ends may be easily hidable behind convergent subgoals 2023-04-02T14:51:23.245Z
New Alignment Research Agenda: Massive Multiplayer Organism Oversight 2023-04-01T08:02:13.474Z
Descriptive vs. specifiable values 2023-03-26T09:10:56.334Z
Shell games 2023-03-19T10:43:44.184Z
Are there cognitive realms? 2023-03-12T19:28:52.935Z
Do humans derive values from fictitious imputed coherence? 2023-03-05T15:23:04.065Z
Counting-down vs. counting-up coherence 2023-02-27T14:59:39.041Z
Does novel understanding imply novel agency / values? 2023-02-19T14:41:40.115Z
Please don't throw your mind away 2023-02-15T21:41:05.988Z
The conceptual Doppelgänger problem 2023-02-12T17:23:56.278Z
Control 2023-02-05T16:16:41.015Z
Structure, creativity, and novelty 2023-01-29T14:30:19.459Z
Gemini modeling 2023-01-22T14:28:20.671Z
Non-directed conceptual founding 2023-01-15T14:56:36.940Z
Dangers of deference 2023-01-08T14:36:33.454Z
The Thingness of Things 2023-01-01T22:19:08.026Z
[link] The Lion and the Worm 2022-05-16T20:40:22.659Z
Harms and possibilities of schooling 2022-02-22T07:48:09.542Z
Rituals and symbolism 2022-02-10T16:00:14.635Z
Index of some decision theory posts 2017-03-08T22:30:05.000Z
Open problem: thin logical priors 2017-01-11T20:00:08.000Z
Training Garrabrant inductors to predict counterfactuals 2016-10-27T02:41:49.000Z
Desiderata for decision theory 2016-10-27T02:10:48.000Z
Failures of throttling logical information 2016-02-24T22:05:51.000Z
Speculations on information under logical uncertainty 2016-02-24T21:58:57.000Z
Existence of distributions that are expectation-reflective and know it 2015-12-10T07:35:57.000Z

Comments

Comment by TsviBT on jacobjacob's Shortform Feed · 2024-07-24T02:41:49.882Z · LW · GW

So what am I supposed to do if people who control resources that are nominally earmarked for purposes I most care about are behaving this way?

Comment by TsviBT on jacobjacob's Shortform Feed · 2024-07-24T02:41:03.372Z · LW · GW

What are "autists" supposed to do in a context like this?

Comment by TsviBT on Koan: divining alien datastructures from RAM activations · 2024-07-23T17:12:33.930Z · LW · GW

I mean the main thing I'd say here is that we just are going way too slowly / are not close enough. I'm not sure what counts as "jettisoning"; no reason to totally ignore anything, but in terms of reallocating effort, I guess what I advocate for looks like jettisoning everything. If you go from 0% or 2% of your efforts put toward questioning basic assumptions and theorizing based on introspective inspection and manipulation of thinking, to 50% or 80%, then in some sense you've jettisoned everything? Or half-jettisoned it?

Comment by TsviBT on Koan: divining alien datastructures from RAM activations · 2024-07-23T15:40:57.402Z · LW · GW

Thanks, this is helpful to me.

An example of something: do LLMs have real understanding, in the way humans do? There's a bunch of legible stuff that people would naturally pay attention to as datapoints associated with whatever humans do that's called "real understanding". E.g. being able to produce grammatical sentences, being able to answer a wide range of related questions correctly, writing a poem with s-initial words, etc. People might have even considered those datapoints dispositive for real understanding. And now LLMs can do those. ... Now, according to me LLMs don't have much real understanding, in the relevant sense or in the sense humans do. But it's much harder to point at clear, legible benchmarks that show that LLMs don't really understand much, compared to previous ML systems.

then clearly some of those datapoints are more useful than others (as brainstorming aids for developing the underlying theoretical framework),

The "as brainstorming aids for developing the underlying theoretical framework" is doing a lot of work there. I'm noticing here that when someone says "we can try to understand XYZ by looking at legible thing ABC", I often jump to conclusions (usually correctly actually) about the extent to which they are or aren't trying to push past ABC to get to XYZ with their thinking. A key point of the OP is that some datapoints may be helpful, but they aren't the main thing determining whether you get to [the understanding you want] quickly or slowly. The main thing is, vaguely, how you're doing the brainstorming for developing the underlying theoretical framework.

I don’t see why “legible phenomena” datapoints would be systematically worse than other datapoints.

I'm not saying all legible data is bad or irrelevant. I like thinking about human behavior, about evolution, about animal behavior; and my own thoughts are my primary data, which isn't like maximally illegible or something. I'm just saying I'm suspicious of all legible data. Why?

Because there's more coreward data available. That's the argument of the OP: you actually do know how to relevantly theorize (e.g., go off and build a computer--which in the background involves theorizing about datastructures).

Because people streetlight, so they're selecting points for being legible, which cuts against being close to the core of the thing you want to understand.

Because theorizing isn't only, or even always mainly, about data. It's also about constructing new ideas. That's a distinct task; data can be helpful, but there's no guarantee that reading the book of nature will lead you along such that in the background you construct the ideas you needed.

For example, the phenomenon “If I feel cold, then I might walk upstairs and put on a sweater” is “legible”, right? But if someone is in the very early stages of developing a theoretical framework related to goals and motivations, then they sure need to have examples like that in the front of their minds, right? (Or maybe you wouldn't call that example “legible”?)

It's legible, yeah. They should have it in mind, yeah. But after they've thought about it for a while they should notice that the real movers and shakers of the world are weird illegible things like religious belief, governments, progressivism, curiosity, invention, companies, child-rearing, math, resentment, ..., which aren't very relevantly described by the sort of theories people usually come up with when just staring at stuff like cold->sweater, AFAIK.

Comment by TsviBT on Koan: divining alien datastructures from RAM activations · 2024-07-23T12:44:57.177Z · LW · GW

Hm. I think my statement does firmly include the linked paper (at least the first half of it, insofar as I skimmed it).

It's becoming clear that a lot of my statements have background mindsets that would take more substantial focused work to exposit. I'll make some gestural comments.

  • When I say "not a good way..." I mean something like "is not among the top X elements of a portfolio aimed at solving this in 30 years (but may very well be among the top X elements of a portfolio aimed at solving this in 300 years)".
  • Streetlighting, in a very broad sense that encompasses most or maybe all of foregoing science, is a very good strategy for making scientific progress--maybe the only strategy known to work. But it seems to be too slow. So I'm not assuming that "good" is about comparisons between different streetlights; if I were, then I'd consider lots of linguistic investigations to be "good".
  • In fairly wide generality, I'm suspicious of legible phenomena.
    • (This may sound like an extreme statement; yes, I'm making a pretty extreme version of the statement.)
    • The reason is like this: "legible" means something like "readily relates to many things, and to standard/common things". If there's a core thing which is alien to your understanding, the legible emanations from that core are almost necessarily somewhat remote from the core. The emanations can be on a path from here to there, but they also contain a lot of irrelevant stuff, and can maybe in principle be circumvented (by doing math-like reasoning), so to speak.
    • So looking at the bytecode of a compiled python program does give you some access to the concepts involved in the python program itself, but those concepts are refracted through the compiler, so what you're seeing in the bytecode has a lot of structure that's interesting and useful and relevant to thinking about programs more generally, but is not really specifically relevant to the concepts involved in this specific python program.
  • Concretely in the case of linguistics, there's an upstream core which is something like "internal automatic conceptual engineering to serve life tasks and play tasks".
    • ((This pointer is not supposed to, by itself, distinguish the referent from other things that sound like they fit the pointer taken as a description; e.g., fine, you can squint and reasonably say that some computer RL thing is doing "internal automatic..." but I claim the human thing is different and more powerful, and I'm just trying to point at that as distinct from speech.))
    • That upstream core has emanations / compilations / manifestations in speech, writing, internal monologue. The emanations have lots of structure. Some of that structure is actually relevant to the core. A lot of that structure is not very relevant, but is instead mostly about the collision of the core dynamics with other constraints.
    • Phonotactics is interesting, but even though it can be applied to describe how morphemes interact in the arena of speech, I don't think we should expect it to tell us much about morphemes; the additional complexity is about sounds and ears and mouths, and not about morphemes.
    • A general theory about how the cognitive representations of "assassin" and "assassinate" overlap and disoverlap is interesting, but even though it can be applied to describe how ideas interact in the arena of word-production, I don't think we should expect it tell us much about ideas; the additional complexity is about fast parallel datastructures, and not about ideas.
    • In other words, all the "core of how minds work" is hidden somewhere deep inside whatever [CAT] refers to.
Comment by TsviBT on Koan: divining alien datastructures from RAM activations · 2024-07-21T22:48:09.219Z · LW · GW

Then whatever that's doing is a constraint in itself, and I can start off by going looking for patterns of activation that correspond to e.g. simple-but-specific mathematical operations that I can actuate in the computer.

It's an interesting different strategy, but I think it's a bad strategy. I think in the analogy this corresponds to doing something like psychophysics, or studying the algorithms involved in grammatically parsing a sentence; which is useful and interesting in a general sense, but isn't a good way to get at the core of how minds work.

if your hypothesis were correct, Euler would not have had to invent topology in the 1700s

(I don't understand the basic logic here--probably easier to chat about it later, if it's a live question later.)

Comment by TsviBT on Koan: divining alien datastructures from RAM activations · 2024-07-21T09:59:19.824Z · LW · GW

Thinking about it more, I want to poke at the foundations of the koan. Why are we so sure that this is a computer at all? What permits us this certainty, that this is a computer, and that it is also running actual computation rather than glitching out?

Why do you need to be certain? Say there's a screen showing a nice "high-level" interface that provides substantial functionality (without directly revealing the inner workings, e.g. there's no shell). Something like that should be practically convincing.

hash functions are meant to be maximally difficult,

I think the overall pattern of RAM activations should still tip you off, if you know what you're looking for. E.g. you can see the pattern of collisions, and see the pattern of when the table gets resized. Not sure the point is that relevant though, we could also talk about an algorithm that doesn't use especially-obscured components.

Doing so still never gets you to the idea of a homology sphere, and it isn't enough to point towards the mathematically precise definition of an infinite 3-manifold without boundary.

I'm unsure about that, but the more pertinent questions are along the lines of "is doing so the first (in understanding-time) available, or fastest, way to make the first few steps along the way that leads to these mathematically precise definitions? The conjecture here is "yes".

Comment by TsviBT on How do we know that "good research" is good? (aka "direct evaluation" vs "eigen-evaluation") · 2024-07-19T22:07:38.697Z · LW · GW

But yeah if you mean "I don't think it scales to successfully staking out territory around a grift" that seems right.

Comment by TsviBT on How do we know that "good research" is good? (aka "direct evaluation" vs "eigen-evaluation") · 2024-07-19T22:02:44.624Z · LW · GW

No, it's the central example for what would work in alignment. You have to think about the actual problem. The difficulty of the problem and illegibility of intermediate results means eigening becomes dominant, but that's a failure mode.

Comment by TsviBT on How do we know that "good research" is good? (aka "direct evaluation" vs "eigen-evaluation") · 2024-07-19T21:20:47.390Z · LW · GW

If everyone calculates 67*23 in their head, they'll reach a partial consensus. People who disagree with the consensus can ask for an argument, and they'll get a convincing argument which will convince them of the correct answer; and if the argument is unconvincing, and they present a convincing argument for a different answer, that answer will become the consensus. We thus arrive at consensus with no eigening. If this isn't how things play out, it's because there's something wrong with the consensus / with the people's epistemics.

Comment by TsviBT on How do we know that "good research" is good? (aka "direct evaluation" vs "eigen-evaluation") · 2024-07-19T21:16:06.707Z · LW · GW

This is a reasonable question, but seems hard to answer satisfyingly. Maybe something with a similar spirit to "stands up to multiple rounds of cross-examination and hidden-assumption-explicitization".

Comment by TsviBT on How do we know that "good research" is good? (aka "direct evaluation" vs "eigen-evaluation") · 2024-07-19T14:57:31.170Z · LW · GW

A different way of arriving at consensus? I'm kind of annoyed that there's apparently a practice of not proactively thinking of examples, but ok:

  1. If ~everyone is deferring, then they'll converge on some combination of whoever isn't deferring and whatever belief-like objects emerge from the depths in that context.
  2. If ~everyone just wishes to be paid and the payers pay for X, then ~everyone will apparently believe X.
  3. If someone is going around threatening people to believe X, then people will believe X.
Comment by TsviBT on How do we know that "good research" is good? (aka "direct evaluation" vs "eigen-evaluation") · 2024-07-19T14:46:51.241Z · LW · GW

It's almost orthogonal to eigen-evaluation. You can arrive at consensus in lots of ways.

Comment by TsviBT on How do we know that "good research" is good? (aka "direct evaluation" vs "eigen-evaluation") · 2024-07-19T09:53:29.096Z · LW · GW

I didn't read most of the post but it seems like you left out a little known but potentially important way to know whether research is good, which is something we could call "having reasons for thinking that your research will help with AGI alignment and then arguing about those reasons and seeing which reasons make sense".

Comment by TsviBT on Most smart and skilled people are outside of the EA/rationalist community: an analysis · 2024-07-16T15:46:08.462Z · LW · GW

it should be straightforward (and more importantly, should not take so much time that it becomes daunting) to give reasons for that

NOPE!

Comment by TsviBT on Most smart and skilled people are outside of the EA/rationalist community: an analysis · 2024-07-16T15:37:51.733Z · LW · GW

to think that relaxing norms around the way in which particular kinds of information is communicating will not negatively affect the quality of the conversation that unfolds afterwards.

If this happens because someone says something true, relevant, and useful, in a way that doesn't have alternative expressions that are really easy and obvious to do (such as deleting the statement "So and so is a doo-doo head"), then it's the fault of the conversation, not the statement.

Comment by TsviBT on Most smart and skilled people are outside of the EA/rationalist community: an analysis · 2024-07-16T15:13:58.262Z · LW · GW

I'd be open to alternative words for "insane" the way I intended it.

Comment by TsviBT on Most smart and skilled people are outside of the EA/rationalist community: an analysis · 2024-07-16T15:10:39.085Z · LW · GW

I doubt that we're going to get anything useful here, but as an indication of where I'm coming from:

  1. I would basically agree with what you're saying if my first comment had been ad hominem, like "Bogdan is a doo-doo head". That's unhelpful, irrelevant, mean, inflammatory, and corrosive to the culture. (Also it's false lol.)
  2. I think a position can be wrong, can be insanely wrong (which means something like "is very far from the truth, is wrong in a way that produces very wrong actions, and is being produced by a process which is failing to update in a way that it should and is failing to notice that fact"), and can be exactly opposite of the truth (for example, "Redwoods are short, grass is tall" is, perhaps depending on contexts, just about the exact opposite of the truth). And these facts are often knowable and relevant if true. And therefore should be said--in a truth-seeking context. And this is the situation we're in.
  3. If you had responded to my original comment with something like

"Your choice of words makes it seem like you're angry or something, and this is coming out in a way that seems like a strong bid for something, e.g. attention or agreement or something. It's a bit hard to orient to that because it's not clear what if anything you're angry about, and so readers are forced to either rudely ignore / dismiss, or engage with someone who seems a bit angry or standoffish without knowing why. Can you more directly say what's going on, e.g. what you're angry about and what you might request, so we can evaluate that more explicitly?"

or whatever is the analogous thing that's true for you, then we could have talked about that. Instead you called my relatively accurate and intentional presentation of my views as "misleading readers into thinking the case you are bringing forward is stronger than it actually is or that this matter is so obvious and trivial..." which sounds to me like you have a problem in your own thinking and norms of discourse, which is that you're requiring that statements other people make be from the perspective of [the theory that's shared between the expected community of speakers and listeners] in order for you to think they're appropriate or non-misleading.

  1. The fact that I have to explain this to you is probably bad, and is probably mostly your responsibility, and you should reevaluate your behavior. (I'm not trying to be gentle here, and if gentleness would help then you deserve it--but you probably won't get it here from me.)
Comment by TsviBT on Most smart and skilled people are outside of the EA/rationalist community: an analysis · 2024-07-16T14:44:30.878Z · LW · GW

I'd like to understand what it is that has held you back from speed reading external work for hunch seeding for so long.

Well currently I'm not really doing alignment research. My plans / goals / orientation / thinking style have changed over the years, so I've read stuff or tried to read stuff more or less during different periods. When I'm doing my best thinking, yes, I read things for idea seeding / as provocations, but it's only that--I most certainly am not speed reading, the opposite really: read one paragraph, think for an hour and then maybe write stuff. And I'm obviously not reading some random ML paper, jesus christ. Philosophy, metamathematics, theoretical biology, linguistics, psychology, ethology, ... much more interesting and useful.

To me, it seems like solving from scratch is best done not from scratch, if that makes sense.

Absolutely, I 100% agree, IIUC. I also think:

  1. A great majority of the time, when people talk about reading stuff (to "get up to speed", to "see what other people have done on the subject", to "get inspiration", to "become more informed", to "see what approaches/questions there are"...), they are not doing this "from scratch not from scratch" thing.
  2. "the typical EA / rationalist, especially in AI safety research (most often relatively young and junior in terms of research experience / taste)" is absolutely and pretty extremely erring on the side of failing to ever even try to solve the actual problem at all.

Don't defer to what you read.

Yeah, I generally agree (https://tsvibt.blogspot.com/2022/09/dangers-of-deferrence.html), though you probably should defer about some stuff at least provisionally (for example, you should probably try out, for a while, the stance of deferring to well-respected philosophers about what questions are interesting).

I think it's just not appreciated how much people defer to what they read. Specifically, there's a lot of frame deference. This is usually fine and good in lots of contexts (you don't need to, like, question epistemology super hard to become a good engineer, or question whether we should actually be basing our buildings off of liquid material rather than solid material or something). It's catastrophic in AGI alignment, because our frames are bad.

Not sure I answered your question.

Comment by TsviBT on Most smart and skilled people are outside of the EA/rationalist community: an analysis · 2024-07-16T10:10:20.516Z · LW · GW

considerably-better-than-average work on trying to solve the problem from scratch

It's considerably better than average but is a drop in the bucket and is probably mostly wasted motion. And it's a pretty noncentral example of trying to solve the problem from scratch. I think most people reading this comment just don't even know what that would look like.

even for someone interested in this agenda

At a glance, this comment seems like it might be part of a pretty strong case that [the concrete ML-related implications of NAH] are much better investigated by the ML community compared to LW alignment people. I doubt that the philosophically more interesting aspects of Wentworth's perspectives relating to NAH are better served by looking at ML stuff, compared to trying from scratch or looking at Wentworth's and related LW-ish writing. (I'm unsure about the mathematically interesting aspects; the alternative wouldn't be in the ML community but would be in the mathematical community.)

And most importantly "someone interested in this agenda" is already a somewhat nonsensical or question-begging conditional. You brought up "AI safety research" specifically, and by that term you are morally obliged to mean [the field of study aimed at figuring out how to make cognitive systems that are more capable than humanity and also serve human value]. That pursuit is better served by trying from scratch. (Yes, I still haven't presented an affirmative case. That's because we haven't even communicated about the proposition yet.)

Comment by TsviBT on Most smart and skilled people are outside of the EA/rationalist community: an analysis · 2024-07-16T08:19:14.661Z · LW · GW

I disagree re/ the word "insane". The position to which I stated a counterposition is insane.

"it's exactly opposite of the truth" and "absolutely" not only fails to help your case, but in my view actively makes things worse by using substance-free rhetoric that misleads readers into thinking the case you are bringing forward is stronger than it actually is or that this matter is so obvious and trivial that they shouldn't even need to think very hard about it before taking your side.

I disagree, I think I should state my actual position. The phrases you quoted have meaning and conveys my position more than if they were removed.

Comment by TsviBT on Most smart and skilled people are outside of the EA/rationalist community: an analysis · 2024-07-16T08:16:50.695Z · LW · GW

The comment I was responding to also didn't offer serious relevant arguments.

https://tsvibt.blogspot.com/2023/09/a-hermeneutic-net-for-agency.html

Comment by TsviBT on Most smart and skilled people are outside of the EA/rationalist community: an analysis · 2024-07-15T15:33:38.135Z · LW · GW

especially in AI safety research

This is insanely wrong; it's exactly opposite of the truth. If you want to do something cool in the world, you should learn more stuff from what other humans have done. If, on the other hand, you want to solve the insanely hard engineering/philosophy problem of AGI alignment in time for humanity to not be wiped out, you absolutely should prioritize solving the problem from scratch.

Comment by TsviBT on Daniel Kokotajlo's Shortform · 2024-07-11T02:01:31.308Z · LW · GW

If anyone's interested in thinking through the basic issues and speculating about possibilities, DM me and let's have a call.

Comment by TsviBT on TsviBT's Shortform · 2024-06-19T05:37:55.634Z · LW · GW

Thanks. (I think we have some ontological mismatches which hopefully we can discuss later.)

Comment by TsviBT on I would have shit in that alley, too · 2024-06-18T04:48:43.403Z · LW · GW

Thanks

Comment by TsviBT on TsviBT's Shortform · 2024-06-17T07:30:21.960Z · LW · GW

Well, it's a quick take. My blog has more detailed explanations, though not organized around this particular point.

Comment by TsviBT on Fabien's Shortform · 2024-06-17T04:05:17.003Z · LW · GW

I vaguely second this. My (intuitive, sketchy) sense is that Fabien has the ready capacity to be high integrity. (And I don't necessarily mind kinda mixing expectation with exhortation about that.) A further exhortation for Fabien: insofar as it feels appropriate, keep your eyes open, looking at both yourself and others, for "large effects on your cognition one way or another"--"black box" (https://en.wikipedia.org/wiki/Flight_recorder) info about such contexts is helpful for the world!

Comment by TsviBT on TsviBT's Shortform · 2024-06-17T02:25:55.676Z · LW · GW

I have no idea what goes on in the limit, and I would guess that what determines the ultimate effects (https://tsvibt.blogspot.com/2023/04/fundamental-question-what-determines.html) would become stable in some important senses. Here I'm mainly saying that the stuff we currently think of as being core architecture would be upturned.

I mean it's complicated... like, all minds are absolutely subject to some constraints--there's some Bayesian constraint, like you can't "concentrate caring in worlds" in a way that correlates too much with "multiversally contingent" facts, compared to how much you've interacted with the world, or something... IDK what it would look like exactly, and if no one else know then that's kinda my point. Like, there's

  1. Some math about probabilities, which is just true--information-theoretic bounds and such. But: not clear precisely how this constrains minds in what ways.
  2. Some rough-and-ready ways that minds are constrained in practice, such as obvious stuff about like you can't know what's in the cupboard without looking, you can't shove more than such and such amount of information through a wire, etc. These are true enough in practice, but also can be broken in terms of their relevant-in-practice implications (e.g. by "hypercompressing" images using generative AI; you didn't truly violate any law of probability but you did compress way beyond what would be expected in a mundane sense).
  3. You can attempt to state more absolute constraints, but IDK how to do that. Naive attempts just don't work, e.g. "you can't gain information just by sitting there with your eyes closed" just isn't true in real life for any meaning of "information" that I know how to state other than a mathematical one (because for example you can gain "logical information", or because you can "unpack" information you already got (which is maybe "just" gaining logical information but I'm not sure, or rather I'm not sure how to really distinguish non/logical info), or because you can gain/explicitize information about how your brain works which is also information about how other brains work).
  4. You can describe or design minds as having some architecture that you think of as Bayesian. E.g. writing a Bayesian updater in code. But such a program would emerge / be found / rewrite itself so that the hypotheses it entertains, in the descriptive Bayesian sense, are not the things stored in memory and pointed at by the "hypotheses" token in your program.

Another class of constraints like this are those discussed in computational complexity theory.

So there are probably constraints, but we don't really understand them and definitely don't know how to weild them, and in particular we understand the ones about goal-pursuits much less well than we understand the ones about probability.

Comment by TsviBT on TsviBT's Shortform · 2024-06-17T01:33:21.640Z · LW · GW

I'd go stronger than just "not for certain, not forever", and I'd worry you're not hearing my meaning (agree or not). I'd say in practice more like "pretty soon, with high likelihood, in a pretty deep / comprehensive / disruptive way". E.g. human culture isn't just another biotic species (you can make interesting analogies but it's really not the same).

Comment by TsviBT on TsviBT's Shortform · 2024-06-17T01:30:11.574Z · LW · GW

We'd have to talk more / I'd have to read more of what you wrote, for me to give a non-surface-level / non-priors-based answer, but on priors (based on, say, a few dozen conversations related to multiple agency) I'd expect that whatever you mean by hierarchical agency is dodging the problem. It's just more homunculi. It could serve as a way in / as a centerpiece for other thoughts you're having that are more so approaching the problem, but the hierarchicalness of the agency probably isn't actually the relevant aspect. It's like if someone is trying to explain how a car goes and then they start talking about how, like, a car is made of four wheels, and each wheel has its own force that it applies to a separate part of the road in some specific position and direction and so we can think of a wheel as having inside of it, or at least being functionally equivalent to having inside of it, another smaller car (a thing that goes), and so a car is really an assembly of 4 cars. We're just... spinning our wheels lol.

Just a guess though. (Just as a token to show that I'm not completely ungrounded here w.r.t. multi-agency stuff in general, but not saying this addresses specifically what you're referring to: https://tsvibt.blogspot.com/2023/09/the-cosmopolitan-leviathan-enthymeme.html)

Comment by TsviBT on TsviBT's Shortform · 2024-06-17T01:00:08.631Z · LW · GW

Say you have a Bayesian reasoner. It's got hypotheses; it's got priors on them; it's got data. So you watch it doing stuff. What happens? Lots of stuff changes, tide goes in, tide goes out, but it's still a Bayesian, can't explain that. The stuff changing is "not deep". There's something stable though: the architecture in the background that "makes it a Bayesian". The update rules, and the rest of the stuff (for example, whatever machinery takes a hypothesis and produces "predictions" which can be compared to the "predictions" from other hypotheses). And: it seems really stable? Like, even reflectively stable, if you insist?

So does this solve stability? I would say, no. You might complain that the reason it doesn't solve stability is just that the thing doesn't have goal-pursuits. That's true but it's not the core problem. The same issue would show up if we for example looked at the classical agent architecture (utility function, counterfactual beliefs, argmaxxing actions).

The problem is that the agency you can write down is not the true agency. "Deep change" is change that changes elements that you would have considered deep, core, fundamental, overarching... Change that doesn't fit neatly into the mind, change that isn't just another piece of data that updates some existing hypotheses. See https://tsvibt.blogspot.com/2023/01/endo-dia-para-and-ecto-systemic-novelty.html

Comment by TsviBT on TsviBT's Shortform · 2024-06-16T23:22:54.640Z · LW · GW

An important thing that the AGI alignment field never understood:

Reflective stability. Everyone thinks it's about, like, getting guarantees, or something. Or about rationality and optimality and decision theory, or something. Or about how we should understand ideal agency, or something.

But what I think people haven't understood is

  1. If a mind is highly capable, it has a source of knowledge.
  2. The source of knowledge involves deep change.
  3. Lots of deep change implies lots of strong forces (goal-pursuits) operating on everything.
  4. If there's lots of strong goal-pursuits operating on everything, nothing (properties, architectures, constraints, data formats, conceptual schemes, ...) sticks around unless it has to stick around.
  5. So if you want something to stick around (such as the property "this machine doesn't kill all humans") you have to know what sort of thing can stick around / what sort of context makes things stick around, even when there are strong goal-pursuits around, which is a specific thing to know because most things don't stick around.
  6. The elements that stick around and help determine the mind's goal-pursuits have to do so in a way that positively makes them stick around (reflective stability of goals).

There's exceptions and nuances and possible escape routes. And the older Yudkowsky-led research about decision theory and tiling and reflective probability is relevant. But this basic argument is in some sense simpler (less advanced, but also more radical ("at the root")) than those essays. The response to the failure of those essays can't just be to "try something else about alignment"; the basic problem is still there and has to be addressed.

(related elaboration: https://tsvibt.blogspot.com/2023/01/a-strong-mind-continues-its-trajectory.html https://tsvibt.blogspot.com/2023/01/the-voyage-of-novelty.html )

Comment by TsviBT on microwave drilling is impractical · 2024-06-13T04:39:39.285Z · LW · GW

I guess that's right... what if you have a series of pumps in the same pipe, say one every kilometer?

Comment by TsviBT on microwave drilling is impractical · 2024-06-13T02:11:52.952Z · LW · GW

So, the deeper the hole is, the higher the air pressure needs to be.
 

IDK about physics but would it help to have another pipe that is a vacuum? (Like, hooked up to a vacuum pump stationed on ground level.) So then you don't need such a high pressure at the bottom?  

Comment by TsviBT on jacquesthibs's Shortform · 2024-06-11T14:51:22.678Z · LW · GW

It's clear from Sutton's original article. https://www.lesswrong.com/posts/sTDfraZab47KiRMmT/views-on-when-agi-comes-and-on-strategy-to-reduce#The_bitter_lesson_and_the_success_of_scaling

Comment by TsviBT on The Data Wall is Important · 2024-06-10T19:43:55.349Z · LW · GW

I don't think my meme image is a good argument against all arguments that stuff will go up. But I don't think it's "straw manning the argument". The argument given is often pretty much literally just "look, it's been going up", maybe coupled with some mumbling about how "people are working on the data problem something something self-play something something synthetic data".

But progress continuing in a roughly linear fashion between now and 2027 seems, to me, totally "strikingly plauisble."

Do you think my image disagrees with that? Look again.

Comment by TsviBT on The Data Wall is Important · 2024-06-10T09:40:02.197Z · LW · GW
May be an image of text that says '0.8 the trendlines have been astonishingly consistent, despite naysayers at every turn 0.6 100 ............. ................ 0.4 10-1 10-3 1oL 10-5 0.2 10-4 10-7 2018 2019 2020 2021 0820190102 2026 20. -1 -1.5 -1 -0.5 0.5 1 1.5'
Comment by TsviBT on Demystifying "Alignment" through a Comic · 2024-06-09T19:42:44.815Z · LW · GW

Since the author seems to have been discouraged at one point:

The good: The images (the blobs) are really good. Cute, quirky, engaging, in some cases good explainers. Overall the production value seems high.

The confusing: I'm not familiar with manga, e.g. its idioms or vibes; so there's probably stuff you were going for that I just wouldn't understand.

The probably bad: I'm guessing that the writing has too many leaps that are too unclear. I could imagine this being a style, like in a fictional work where things are sort of referenced without really being explained but referenced in a way that makes them "feel real" to the reader (and maybe makes them go look it up). But I'd guess the comic doesn't hit that as-is. Possibly the comic would benefit from test readers (preferably from your target audience) who you talk with to see where they got bored / confused.

Comment by TsviBT on Announcing ILIAD — Theoretical AI Alignment Conference · 2024-06-05T21:32:42.813Z · LW · GW

Insightful

Learning

Implore

Agreed

Delta

Comment by TsviBT on Announcing ILIAD — Theoretical AI Alignment Conference · 2024-06-05T21:08:29.995Z · LW · GW

idk, sounds dangerously close to deferences

Comment by TsviBT on Announcing ILIAD — Theoretical AI Alignment Conference · 2024-06-05T20:02:40.516Z · LW · GW

honestly i prefer undonfrences

Comment by TsviBT on Seth Herd's Shortform · 2024-06-02T23:04:06.569Z · LW · GW

Yeah I think there's a miscommunication. We could try having a phone call.

A guess at the situation is that I'm responding to two separate things. One is the story here:

One mainstay of claiming alignment is near-impossible is the difficulty of "solving ethics" - identifying and specifying the values of all of humanity. I have come to think that this is obviously (in retrospect - this took me a long time) irrelevant for early attempts at alignment: people will want to make AGIs that follow their instructions, not try to do what all of humanity wants for all of time. This also massively simplifies the problem; not only do we not have to solve ethics, but the AGI can be corrected and can act as a collaborator in improving its alignment as we collaborate to improve its intelligence.

It does simplify the problem, but not massively relative to the whole problem. A harder part shows up in the task of having a thing that

  1. is capable enough to do things that would help humans a lot, like a lot a lot, whether or not it actually does those things, and
  2. doesn't kill everyone destroy approximately all human value.

And I'm not pulling a trick on you where I say that X is the hard part, and then you realize that actually we don't have to do X, and then I say "Oh wait actually Y is the hard part". Here is a quote from "Coherent Extrapolated Volition", Yudkowsky 2004 https://intelligence.org/files/CEV.pdf:

  1. Solving the technical problems required to maintain a well-specified abstract invariant in a self-modifying goal system. (Interestingly, this problem is relatively straightforward from a theoretical standpoint.)
  2. Choosing something nice to do with the AI. This is about midway in theoretical hairiness between problems 1 and 3.
  3. Designing a framework for an abstract invariant that doesn’t automatically wipe out the human species. This is the hard part.

I realize now that I don't know whether or not you view IF as trying to address this problem.

The other thing I'm responding to is:

the AGI can be corrected and can act as a collaborator in improving its alignment as we collaborate to improve its intelligence.

If the AGI can (relevantly) act as a collaborator in improving its alignment, it's already a creative intelligence on par with humanity. Which means there was already something that made a creative intelligence on par with humanity. Which is probably fast, ongoing, and nearly inextricable from the mere operation of the AGI.

I also now realize that I don't know how much of a crux for you the claim that you made is.

Comment by TsviBT on MIRI 2024 Communications Strategy · 2024-06-02T10:21:00.629Z · LW · GW

I personally have updated a fair amount over time on

  • people (going on) expressing invalid reasoning for their beliefs about timelines and alignment;
  • people (going on) expressing beliefs about timelines and alignment that seemed relatively more explicable via explanations other than "they have some good reason to believe this that I don't know about";
  • other people's alignment hopes and mental strategies have more visible flaws and visible doomednesses;
  • other people mostly don't seem to cumulatively integrate the doomednesses of their approaches into their mental landscape as guiding elements;
  • my own attempts to do so fail in a different way, namely that I'm too dumb to move effectively in the resulting modified landscape.

We can back out predictions of my personal models from this, such as "we will continue to not have a clear theory of alignment" or "there will continue to be consensus views that aren't supported by reasoning that's solid enough that it ought to produce that consensus if everyone is being reasonable".

Comment by TsviBT on Non-Disparagement Canaries for OpenAI · 2024-06-02T05:16:02.567Z · LW · GW

That's another main possibility. I don't buy the reasoning in general though--integrity is just super valuable. (Separately I'm aware of projects that are very important and neglected (legibly so) without being funded, so I don't overall believe that there are a bunch of people strategically capitulating to anti-integrity systems in order to fund key projects.) Anyway, my main interest here is to say that there is a real, large-scale, ongoing problem(s) with the social world, which increases X-risk; it would be good for some people to think clearly about that; and it's not good to be satisfied with false / vague / superficial stories about what's happening.

Comment by TsviBT on Non-Disparagement Canaries for OpenAI · 2024-06-02T03:58:08.249Z · LW · GW

I'm interpreting "realize" colloquially, as in, "be aware of". I don't think the people discussed in the post just haven't had it occur to them that pre-singularity wealth doesn't matter because a win singularity society very likely wouldn't care much about it. Instead someone might, for example...

  • ...care a lot about their and their people's lives in the next few decades.
  • ...view it as being the case that [wealth mattering] is dependent on human coordination, and not trust others to coordinate like that. (In other words: the "stakeholders" would have to all agree to cede de facto power from themselves, to humanity.)
  • ...not agree that humanity will or should treat wealth as not mattering; and instead intend to pursue a wealthy and powerful position mid-singularity, with the expectation of this strategy having large payoffs.
  • ...be in some sort of mindbroken state (in the genre of Moral Mazes), such that they aren't really (say, in higher-order derivatives) modeling the connection between actions and long-term outcomes, and instead are, I don't know, doing something else, maybe involving arbitrary obeisance to power.

I don't know what's up with people, but I think it's potentially important to understand deeply what's up with people, without making whatever assumption goes into thinking that IF someone only became aware of this vision of the future, THEN they would adopt it.

(If Tammy responded that "realize" was supposed to mean the etymonic sense of "making real" then I'd have to concede.)

Comment by TsviBT on Seth Herd's Shortform · 2024-06-02T01:23:59.867Z · LW · GW

the AGI can be corrected and can act as a collaborator in improving its alignment as we collaborate to improve its intelligence.

Why do you think you can get to a state where the AGI is materially helping to solve extremely difficult problems (not extremely difficult like chess, extremely difficult like inventing language before you have language), and also the AGI got there due to some process that doesn't also immediately cause there to be a much smarter AGI? https://tsvibt.blogspot.com/2023/01/a-strong-mind-continues-its-trajectory.html

Comment by TsviBT on MIRI 2024 Communications Strategy · 2024-05-31T20:46:54.903Z · LW · GW

IDK if there's political support that would be helpful and that could be affected by people saying things to their representatives. But if so, then it would be helpful to have a short, clear, on-point letter that people can adapt to send to their representatives. Things I'd want to see in such a letter:

  1. AGI, if created, would destroy all or nearly all human value.
  2. We aren't remotely on track to solving the technical problems that would need to be solved in order to build AGI without destroying all or nearly all human value.
  3. Many researchers say they are trying to build AGI and/or doing research that materially contributes toward building AGI. None of those researchers has a plausible plan for making AGI that doesn't destroy all or nearly all human value.
  4. As your constituent, I don't want all or nearly all human value to be destroyed.
  5. Please start learning about this so that you can lend your political weight to proposals that would address existential risk from AGI.
  6. This is more important to me than all other risks about AI combined.

Or something.

Comment by TsviBT on Non-Disparagement Canaries for OpenAI · 2024-05-30T20:33:51.470Z · LW · GW

I wish you would realize that whatever we're looking at, it isn't people not realizing this.

Comment by TsviBT on Talent Needs of Technical AI Safety Teams · 2024-05-29T22:39:04.335Z · LW · GW

Look... Consider the hypothetically possible situation that in fact everyone is very far from being on the right track, and everything everyone is doing doesn't help with the right track and isn't on track to get on the right track or to help with the right track.

Ok, so I'm telling you that this hypothetically possible situation seems to me like the reality. And then you're, I don't know, trying to retreat to some sort of agreeable live-and-let-live stance, or something, where we all just agree that due to model uncertainty and the fact that people have vaguely plausible stories for how their thing might possibly be helpful, everyone should do their own thing and it's not helpful to try to say that some big swath of research is doomed? If this is what's happening, then I think that what you in particular are doing here is a bad thing to do here.

Maybe we can have a phone call if you'd like to discuss further.