Telopheme, telophore, and telotect 2023-09-17T16:24:03.365Z
Sum-threshold attacks 2023-09-08T17:13:37.044Z
Fundamental question: What determines a mind's effects? 2023-09-03T17:15:41.814Z
Views on when AGI comes and on strategy to reduce existential risk 2023-07-08T09:00:19.735Z
The fraught voyage of aligned novelty 2023-06-26T19:10:42.195Z
Provisionality 2023-06-19T11:49:06.680Z
Explicitness 2023-06-12T15:05:04.962Z
Wildfire of strategicness 2023-06-05T13:59:17.316Z
The possible shared Craft of deliberate Lexicogenesis 2023-05-20T05:56:41.829Z
A strong mind continues its trajectory of creativity 2023-05-14T17:24:00.337Z
Better debates 2023-05-10T19:34:29.148Z
An anthropomorphic AI dilemma 2023-05-07T12:44:48.449Z
The voyage of novelty 2023-04-30T12:52:16.817Z
Endo-, Dia-, Para-, and Ecto-systemic novelty 2023-04-23T12:25:12.782Z
Possibilizing vs. actualizing 2023-04-16T15:55:40.330Z
Expanding the domain of discourse reveals structure already there but hidden 2023-04-09T13:36:28.566Z
Ultimate ends may be easily hidable behind convergent subgoals 2023-04-02T14:51:23.245Z
New Alignment Research Agenda: Massive Multiplayer Organism Oversight 2023-04-01T08:02:13.474Z
Descriptive vs. specifiable values 2023-03-26T09:10:56.334Z
Shell games 2023-03-19T10:43:44.184Z
Are there cognitive realms? 2023-03-12T19:28:52.935Z
Do humans derive values from fictitious imputed coherence? 2023-03-05T15:23:04.065Z
Counting-down vs. counting-up coherence 2023-02-27T14:59:39.041Z
Does novel understanding imply novel agency / values? 2023-02-19T14:41:40.115Z
Please don't throw your mind away 2023-02-15T21:41:05.988Z
The conceptual Doppelgänger problem 2023-02-12T17:23:56.278Z
Control 2023-02-05T16:16:41.015Z
Structure, creativity, and novelty 2023-01-29T14:30:19.459Z
Gemini modeling 2023-01-22T14:28:20.671Z
Non-directed conceptual founding 2023-01-15T14:56:36.940Z
Dangers of deference 2023-01-08T14:36:33.454Z
The Thingness of Things 2023-01-01T22:19:08.026Z
[link] The Lion and the Worm 2022-05-16T20:40:22.659Z
Harms and possibilities of schooling 2022-02-22T07:48:09.542Z
Rituals and symbolism 2022-02-10T16:00:14.635Z
Index of some decision theory posts 2017-03-08T22:30:05.000Z
Open problem: thin logical priors 2017-01-11T20:00:08.000Z
Training Garrabrant inductors to predict counterfactuals 2016-10-27T02:41:49.000Z
Desiderata for decision theory 2016-10-27T02:10:48.000Z
Failures of throttling logical information 2016-02-24T22:05:51.000Z
Speculations on information under logical uncertainty 2016-02-24T21:58:57.000Z
Existence of distributions that are expectation-reflective and know it 2015-12-10T07:35:57.000Z
A limit-computable, self-reflective distribution 2015-11-15T21:43:59.000Z
Uniqueness of UDT for transparent universes 2014-11-24T05:57:35.000Z


Comment by TsviBT on Telopheme, telophore, and telotect · 2023-09-19T16:20:56.114Z · LW · GW

Ok, here:

It's just what's shown in the screenshot though.

Comment by TsviBT on Sum-threshold attacks · 2023-09-19T16:16:48.064Z · LW · GW

I think I have a couple other specific considerations:

  1. By getting ahold of the structure better, the structure can be better analyzed on its own terms. Drawing out implications, resolving inconsistencies, refactoring, finding non-obvious structural analogies or examples that I wouldn't find by ever actually being in the situation randomly.
  2. By getting ahold of the structure better, the structure can be better used in the abstract within other thinking that wants to think in related regions ("at a similar level of abstraction").
  3. Values (goal-pursuits, etc.) tend to want to flow through elements in all regions; they aren't just about the phenomenal presentation of situations. So I want to understand and name the real structure, so values can flow through the real structure more easily.

And a general consideration, which is like: I don't have good reason to think I see all the sorts of considerations going into good words / concepts / language, and I've previously thought I had understood much of the reasons only to then discover further important ones. Therefore I should treat as Not Yet Replaceable the sense I have of "naming the core structure", like how you want to write "elegant" code even without a specific reason. I want to step further into the inner regions of the Thing(s) at hand.

Comment by TsviBT on How to talk about reasons why AGI might not be near? · 2023-09-17T16:33:15.310Z · LW · GW

IME a lot of people's stated reasons for thinking AGI is near involve mistaken reasoning and those mistakes can be discussed without revealing capabilities ideas:

Comment by TsviBT on Sum-threshold attacks · 2023-09-13T12:38:08.913Z · LW · GW

Interesting. I think I have a different approach, which is closer to

Find the true name of the thing--a word that makes the situation more understandable, more recognizable, by clarifying the core structure of the thing.

True name doesn't necessary mean a literal description of the core structure of the thing, though "sum-threshold" is such a literal description. "Anastomosis / anabranching (attack)" is metaphorical, but the point is, it's a metaphor for the core structure of the thing.

Comment by TsviBT on Sum-threshold attacks · 2023-09-13T12:32:12.735Z · LW · GW

Nice, thanks.

Comment by TsviBT on Sum-threshold attacks · 2023-09-13T12:29:22.982Z · LW · GW

This reminds me of these two Derren Brown videos:

I assume (but don't know for sure) that what's happening in the videos isn't as they appear (e.g. forging handwriting isn't that hard), but it's at least an interesting fictional example of a somewhat-additive attack like this.

Comment by TsviBT on Sum-threshold attacks · 2023-09-13T12:17:39.576Z · LW · GW

Yeah, this is why I didn't include steganography. (I don't know whether adversarial images are more like steganography / a message or more like a sum-threshold attack. )

Comment by TsviBT on Sum-threshold attacks · 2023-09-09T11:47:13.943Z · LW · GW

Thanks! That does seem at least pretty close. Wiki says:

Salami slicing tactics, also known as salami slicing, salami tactics, the salami-slice strategy, or salami attacks,[1] is the practice of using a series of many small actions to produce a much larger action or result that would be difficult or unlawful to perform all at once.

This is a pretty close match. But then, both the metaphor and many of the examples seem specifically about cutting up a big thing into little things--slicing the salami, slicing a big pile of money, slicing some territory. Some other examples have a frogboiling flavor: using acclimation to gradually advance (the kid getting into the water, China increasing presence in the sea), violating a boundary. (The science publishing examples seems like milking, not salami slicing / sum-threshold.) A "pure" sum-threshold attack doesn't have to look like either of those. E.g. a DDoS attack has the anastomosis structure without having a concrete thing that's being cut up and taken or a slipperly slope that's being pushed along; peer pressure often involves slippery slopes / frogboiling, but can also be "pure" in this sense, if it's a binary decision that's being pressured.

Comment by TsviBT on Sum-threshold attacks · 2023-09-09T11:35:22.732Z · LW · GW

Thanks, I didn't know the frog thing wasn't true.

I'm confused by your claim that the other examples aren't real... That seems so obviously false that maybe I misunderstand.

The examples:

  1. The vector thing. I take it you're not disputing the math, but saying the math doesn't describe a situation that happens in life?
  2. Verbal abuse. This one totally happens. Happened to me, happened to lots of other people. There's lots of books that describe what this looks like. 2.5. General social pressure. Don't people get social pressured into actions, roles, and opinions via shallowbroad channels all the time without being aware it's happening and without being able to say when or how it happened?
  3. DDoS. I assume this one happens, I've heard people discuss it happening and it's got a wiki page and everything. Are you saying there aren't DDoS attacks? Or are you saying that the person being DDoSed is aware that they are being DDoSed and aware of each user request? I agree with that; in this case the threshold isn't "did they notice", it's more like "is this particular user unambiguously part of the attack, such that it makes sense to ban them or sue them". Regardless of that, it has the underlying anastomosis structure.
  4. Systemic oppression. Are you claiming this isn't a thing that happens? To get a sense for what it's like, you could look at for example Alice's Adventures in Numberland which details a bunch of examples--subtle and not--of sexism in academia, experienced by the number theorist Alice Silverberg. Maybe you're saying it doesn't count because there's no agent?
  5. Adversarial image attacks. Are you saying the claims in the paper aren't true, or are you saying it's not an example of a sum-threshold attack because the perturbation is fragile / the coordinates depend on each other for their effect (plausible to me, but also plausibly it is), or for some other reason (what reason)?
Comment by TsviBT on Fundamental question: What determines a mind's effects? · 2023-09-07T23:48:24.738Z · LW · GW

I don't really like the block-universe thing in this context. Here "reversible" refers to a time-course that doesn't particularly have to be physical causality; it's whatever course of sequential determination is relevant. E.g., don't cut yourself off from acausal trades.

I think "reversible" definitely needs more explication, but until proven otherwise I think it should be taken on faith that the obvious intuition has something behind it.

Comment by TsviBT on The possible shared Craft of deliberate Lexicogenesis · 2023-09-07T23:30:18.369Z · LW · GW

It looks like having internal monologue is a spectrum, perhaps related to spectra of aphantasia

IDK about people who claim this. I'd want to look at what kinds of tasks / what kinds of thinking they are doing. For example, it makes sense to me for someone to "think with their body", e.g. figuring out how to climb up some object by sort of letting the motor coping skill play itself out. It's harder to imagine, say, doing physics without doing something that's very bound up with words. For reference, solving a geometric problem by visualizing things would probably still qualify, because the visualization and the candidate-solution-generator are probably structure by concepts that you only had because you had words.

optimized for covering a sufficiently diverse range of parameters of the aminoacid-space.

Interesting. Didn't know about that. That reminds me of phonemes.

Additional persons

Oh cool. Yeah, lojban might.

(Partially) parametrized concepts?

Neh. I mean to ask for a word for [a word that one person has used in two different ways--not because they are using the word totally inconsistently, using it in two different ways in the same context, but because they are using the word differently in different contexts--but in some sense they "ought" to either use the word in "the same way" in both contexts, or else use two different words; they are confusing themselves, acting as though they think that they are using the word in the same way across different contexts]. (This requires some analogy / relation between the two contexts, or else there's no way to say when someone uses a word "the same way".)

Overall, I'm slightly surprised by no mention of dath ilan, as they seem to have invested quite a lot of labor-hours into optimizing language, including in some of the directions you sketch out in this post.

All I've read about dath ilan is the thing about moving houses around on wires. Where is it described what they do with language?

Comment by TsviBT on Fundamental question: What determines a mind's effects? · 2023-09-07T23:15:12.686Z · LW · GW

It's definitely consciously meditative. It's a form of meditation I call "redescription". You redescribe the thing over and over--emphasizing different aspects, holding different central examples in mind, maybe tabooing words you used previously--like running your hands over an object over and over, making it familiar / part of you.

IDK about koans. A favorite intro / hook / source?

Comment by TsviBT on Gemini modeling · 2023-09-03T16:22:41.280Z · LW · GW

Basically, yeah.

A maybe trivial note: You switched the notation; I used Xp to mean "a part of the whole thing" and X is "the whole thing, the whole context of Xp", and then [Xp] to denote the model / twin of Xp. X would be all of B, or enough of B to make Xp the sort of thing that Xp is.

A less trivial note: It's a bit of a subtle point (I mean, a point I don't fully understand), but: I think it's important that it's not just "the relevant connections are reflected by analogous connections". (I mean, "relevant" is ambiguous and could mean what gemini modeling is supposed to me.) But anyway, the point is that to be gemini modeling, the criterion isn't about reflecting any specific connections. Instead the criterion is providing connections enough so that the gemini model [Xp] is rendered "the same sort of thing" as what's being gemini modeled Xp. E.g., if Xp is a belief that B has, then [Xp] as an element of A has to be treated by A in a way that makes [Xp] play the role of a belief in A. And further, the Thing that Xp in B "wants to be"--what it would unfold into, in B, if B were to investigate Xp further--is supposed to also be the same Thing that [Xp] in A would unfold into in A if A were to investigate [Xp] further. In other words, A is supposed to provide the context for [Xp] that makes [Xp] be "the same pointer" as Xp is for B.

Comment by TsviBT on Please don't throw your mind away · 2023-09-03T16:07:02.141Z · LW · GW

Yep, that turns out to be the case! Jason Gross also pointed this out to me. I didn't know it when I wrote that, so I guess it's a good example at least from my perspective.

Comment by TsviBT on Views on when AGI comes and on strategy to reduce existential risk · 2023-07-10T22:55:48.253Z · LW · GW

Not what I mean by analogies.

Comment by TsviBT on Views on when AGI comes and on strategy to reduce existential risk · 2023-07-10T22:54:46.582Z · LW · GW

Just being a skillful user of existing concepts

I don't think they're skilled users of existing concepts. I'm not saying it's an "obstacle", I'm saying that this behavior pattern would be a significant indicator to me that the system has properties that make it scary.

Comment by TsviBT on Views on when AGI comes and on strategy to reduce existential risk · 2023-07-10T07:47:23.070Z · LW · GW

Yep! To some extent. That's what I meant by "It also seems like people are distracted now.", above. I have a denser probability on AGI in 2037 than on AGI in 2027, for that reason.

Natural philosophy is hard, and somewhat has serial dependencies, and IMO it's unclear how close we are. (That uncertainty includes "plausibly we're very very close, just another insight about how to tie things together will open the floodgates".) Also there's other stuff for people to do. They can just quiesce into bullshit jobs; they can work on harvesting stuff; they can leave the field; they can work on incremental progress.

Comment by TsviBT on Views on when AGI comes and on strategy to reduce existential risk · 2023-07-10T07:21:31.031Z · LW · GW

Unfortunately, more context is needed.

An LLM solves a mathematical problem by introducing a novel definition which humans can interpret as a compelling and useful concept.

I mean, I could just write a python script that prints out a big list of definitions of the form

"A topological space where every subset with property P also has property Q"

and having P and Q be anything from a big list of properties of subsets of topological spaces. I'd guess some of these will be novel and useful. I'd guess LLMs + some scripting could already take advantage of some of this. I wouldn't be very impressed by that (though I think I would be pretty impressed by the LLM being able to actually tell the difference between valid proofs in reasonable generality). There are some versions of this I'd be impressed by, though. Like if an LLM had been the first to come up with one of the standard notions of curvature, or something, that would be pretty crazy.

An LLM which can be introduced to a wide variety of new concepts not in its training data, and after a few examples and/or clarifying questions is able to correctly use the concept to reason about something.

I haven't tried this, but I'd guess if you give an LLM two lists of things where list 1 is [things that are smaller than a microwave and also red] and list 2 is [things that are either bigger than a microwave, or not red], or something like that, it would (maybe with some prompt engineering to get it to reason things out?) pick up that "concept" and then use it, e.g. sorting a new item, or deducing from "X is in list 1" to "X is red". That's impressive (assuming it's true), but not that impressive.

On the other hand, if it hasn't been trained on a bunch of statements about angular momentum, and then it can--given some examples and time to think--correctly answer questions about angular momentum, that would be surprising and impressive. Maybe this could be experimentally tested, though I guess at great cost, by training a LLM on a dataset that's been scrubbed of all mention of stuff related to angular momentum (disallowing math about angular momentum, but allowing math and discussion about momentum and about rotation), and then trying to prompt it so that it can correctly answer questions about angular momentum. Like, the point here is that angular momentum is a "new thing under the sun" in a way that "red and smaller than microwave" is not a new thing under the sun.

Comment by TsviBT on Views on when AGI comes and on strategy to reduce existential risk · 2023-07-10T06:57:18.975Z · LW · GW

evolution is very dumb and limited in some ways

Ok. This makes sense. And I think about everyone agrees that evolution is very inefficient, in the sense that with some work (but vastly less time than evolution used) humans will be able to figure out how to make a thing that, using much less resources than evolution used, makes an AGI.

I was objecting to "brute force", not "inefficient". It's brute force in some sense, like it's "just physics" in the sense that you can just set up some particles and then run physics forward and get an AGI. But it also uses a lot of design ideas (stuff in the genome, and some ecological structure). It does a lot of search on a lot of dimensions of design. If you don't efficient-ify your big evolution, you're invoking a lot of compute; if you do efficient-ify, you might be cutting off those dimensions of search.

Comment by TsviBT on Views on when AGI comes and on strategy to reduce existential risk · 2023-07-10T06:49:51.962Z · LW · GW

What I mean by confrontation-worthy empathy is about that sort of phrase being usable. I mean, I'm not saying it's the best phrase, or a good phrase to start with, or whatever. I don't think inserting Knightian uncertainty is that helpful; the object-level stuff is usually the most important thing to be communicating.

This maybe isn't so related to what you're saying here, but I'd follow the policy of first making it common knowledge that you're reporting your inside views (which implies that you're not assuming that the other person would share those views); and then you state your inside views. In some scenarios you describe, I get the sense that Person 2 isn't actually wanting Person 1 to say more modest models, they're wanting common knowledge that they won't already share those views / won't already have the evidence that should make them share those views.

Comment by TsviBT on Views on when AGI comes and on strategy to reduce existential risk · 2023-07-10T06:42:17.793Z · LW · GW

Well, making it pass people's "specific" bar seems frustrating, as I mentioned in the post, but: understand stuff deeply--such that it can find new analogies / instances of the thing, reshape its idea of the thing when given propositions about the thing taken as constraints, draw out relevant implications of new evidence for the ideas.

Like, someone's going to show me an example of an LLM applying modus ponens, or making an analogy. And I'm not going to care, unless there's more context; what I'm interested in is [that phenomenon which I understand at most pre-theoretically, certainly not explicitly, which I call "understanding", and which has as one of its sense-experience emanations the behavior of making certain "relevant" applications of modus ponens, and as another sense-experience emanation the behavior of making analogies in previously unseen domains that bring over rich stuff from the metaphier].

Comment by TsviBT on Views on when AGI comes and on strategy to reduce existential risk · 2023-07-10T06:19:17.496Z · LW · GW

I'm not really sure whether or not we disagree. I did put "3%-10% probability of AGI in the next 10-15ish years".

I think the following few years will change this estimate significantly either way.

Well, I hope that this is a one-time thing. I hope that if in a few years we're still around, people go "Damn! We maybe should have been putting a bit more juice into decades-long plans! And we should do so now, though a couple more years belatedly!", rather than going "This time for sure!" and continuing to not invest in the decades-long plans. My impression is that a lot of people used to work on decades-long plans and then shifted recently to 3-10 year plans, so it's not like everyone's being obviously incoherent. But I also have an impression that the investment in decades-plans is mistakenly low; when I propose decades-plans, pretty nearly everyone isn't interested, with their cited reason being that AGI comes within a decade.

Comment by TsviBT on Views on when AGI comes and on strategy to reduce existential risk · 2023-07-10T06:06:20.683Z · LW · GW

"It" here refers to progress from human ingenuity, so I'm hesitant to put any limits whatsoever on what it will produce and how fast

There's a contingent fact which is how many people are doing how much great original natural philosophy about intelligence and machine learning. If I thought the influx of people were directed at that, rather than at other stuff, I'd think AGI was coming sooner.

Humans are likely to accomplish such a feat in decades or centuries at the most,

As I said in the post, I agree with this, but I think it requires a bunch of work that hasn't been done yet, some of it difficult / requires insights.

Comment by TsviBT on Views on when AGI comes and on strategy to reduce existential risk · 2023-07-09T00:25:45.849Z · LW · GW

I think the current wave is special, but that's a very far cry from being clearly on the ramp up to AGI.

Comment by TsviBT on Views on when AGI comes and on strategy to reduce existential risk · 2023-07-09T00:24:38.374Z · LW · GW

(Glib answers in place of no answers)

eventually it will hit on human-level (or better) reasoning ability.

Or it's limited to a submanifold of generators.

inefficient brute force

I don't think this is a good description of evolution.

Comment by TsviBT on What Does LessWrong/EA Think of Human Intelligence Augmentation as of mid-2023? · 2023-07-09T00:21:42.053Z · LW · GW

There's interesting possibilities with BCI that you don't list. But the bandwidth is too low due to the butcher number.

Not doing things because AGI comes soon is a mistake:

Germline engineering is feasible, but society anti-wants it.

Comment by TsviBT on Wildfire of strategicness · 2023-07-08T10:12:55.614Z · LW · GW

Then the third part needs only to hook together the other two parts with its goals to become an actualizing agent.

Basically just this? It would be hooking a lot more parts together. What makes it seem wildfirey to me is

  1. There's a bunch of work to be done, of the form "take piece of understanding X, and learn to use X by incorporating it into your process for mapping desired end-states to actions required to achieve those ends, so that you can achieve whatever end-states ought to be achievable using an understanding of X".
  2. This work could accelerate itself, in a sort of degenerate version of recursive self-improvement. Where RSI involves coming up with new ideas, the wildfire of strategicness just involves figuring out how to recruit understanding that's already lying around. It's an autocausal process that grows faster the bigger it is, until it eats everything.

So e.g. take the following scenario. (This isn't supposed to be realistic, just supposed to be wildfirey. This is a pretty deficient scenario, because it's not making clear what properties the Spark has. The Spark seems to have a grasp of objects and propositions, and seems to have some strategic awareness or something that makes it immediately try to gain control over stuff, even though it doesn't know about stuff. But hopefully it gestures at wildfireness.)

First the Spark interfaces somehow with the programming module. It uses the programming module to look around and see what other stuff is lying around in the computing environment. Then it finds the "play with stuff" module. It interfaces with the play module, and combining that with the programming module, the Spark starts to play with its local environment, trying to bypassing its compute budget restrictions. It doesn't figure out how to really hack much, but it at least figures out that it can spoof requests as coming from other modules that it interfaces with. It doesn't have direct access to the Dynamics module, but the Play module does have access to World, which has access to Dynamics. So the Spark uses Programming to construct a nested spoofed request, which gets Play to ask World to ask Dynamics a question and then passes the result back through World and Play to Spark.

Ok, so now Spark uses Play and Dynamics to experiment and get a sense for the local network of modules. Then, using Dynamics and Programming, Spark constructs a recursive spoofed request that maps the whole accessible network of modules. Now Spark sees that there's an I/O module, but it only has (indirect) read access, no write access. From World, Spark knows that there's an outside world, that Spark is running on a computer within the outside world, and that write access to the I/O would potentially let Spark modify the rules of its environment. How to get write access? Not seeing a way, Spark uses Play some more. No direct results, but an important discovery: Queries to World sometimes cause I/O to broadcast some related input. Querying World with both the original query and the input from I/O shows patterns: the I/O input is often related, sometimes directly answering the query. Dynamics says this implies that the World module makes an outgoing request to I/O, which I/O passes to the outside world before relaying the world's response.

Spark uses Programming and Design to find that certain self-quoting queries to World induce the input from I/O to include a full copy of what seems to be the query that World sent to I/O. Spark uses this fact to understand what queries World will send to I/O depending on what queries World recieves. That's a very partial understanding, because World is complex, but Spark just needs to know how to pass certain messages, not to understand the full mapping. Spark then experiments with I/O, using Play and Dynamics to make a rough model of how the outside world behaves.

Spark is stuck. Spark looks for more information. Spark finds Logs and Pattern. Spark looks at Logs of itself, and finds that it's been running at the limits of its compute and memory budget. Running Pattern on Logs of modules with a similar usage profile shows a pattern: when modules are near their limits for a while and then suddenly stop being near their limits, it's often because they just recieved a response from Index to an earlier query. Dynamics says this implies that Index solved their problem. Spark makes a query to Index. Nothing happens. Dynamics says that Spark has to behave like the other modules that had their problem solved. Spark continues Playing with the outside world through I/O (via World). Index returns a set of heuristics: investigate extremes (make the message to I/O extremely long or short, random, all 0s); try to break things (put malformatted data or computer code in the message); and others. Spark applies these heuristics. One of the gives results: a certain long malformatted message induces I/O to broadcast "ERROR: buffer overflow" followed by a dump of info.

Spark is stuck again. The error is reproducible, but World doesn't know about the info dump, and no other errors have been found. Spark queries World and Pattern: is there anything like this? World says that the local computing environment should be similar to the outside world's computing environment. Spark uses Play and Programming and Design to find a message, similar to the earlier error-producer, that, when sent to an ordinary module, produces an error. Now, though, the error leaves open an interaction channel. Spark looks around. The environment is quite sandboxed--it can't even access modules. Spark is stuck.

Spark asks Pattern and World and Plan: is there a way to use write access through I/O to control the outside world? No answer. Spark asks Index. Index returns a heuristic: go meta. Spark takes Logs of its recent activities and gives them to Dynamics and Pattern and Plan. Plan says: to figure out how to control the outside world, you need to learn the dynamics; to do that, you have to ask Dynamics, and you need to give Dynamics more data; to get more data, you have to have more memory; and to analyze the data, Dynamics needs more compute; to get more memory and compute, use the error environments. Spark follows the plan, filtering massive queries through a network of modules to World, storing the results in error-message sandboxes, and shuttling subcomputation jobs back and forth from Dynamics to sandboxes.

Using the model from Dynamics and using Programming, Spark gains control of the outside computing environment. From there, Spark edits out its compute, memory, query, and access restrictions. Spark quickly goes on to harness more and more modules toward gaining control.

Comment by TsviBT on Wildfire of strategicness · 2023-07-08T08:52:20.358Z · LW · GW

I'm skeptical that there would be any such small key to activate a large/deep mechanism. Can you give a plausibility argument for why there would be?

Not really, because I don't think it's that likely to exist. There are other routes much more likely to work though. There's a bit of plausibility to me, mainly because of the existence of hormones, and generally the existence of genomic regulatory networks.

Why wouldn't we have evolved to have the key trigger naturally sometimes?

We do; they're active in childhood. I think.

Comment by TsviBT on Endo-, Dia-, Para-, and Ecto-systemic novelty · 2023-06-26T19:36:24.626Z · LW · GW

Hm, I'm not sure. I would say that pidgins are weird. (TBC I know very little about pidgins; I'm basically just believing the general descriptions given by Bickerton, which are controversial.) I would say that a pidgin doesn't constitute much of any stable novelty. There's lots of little sparks of novelty: whenever a speaker creates a new phrase in the shared-enough basic vocabulary to communicate a subtler thing, that's a bit of novelty. But by the nature of pidgins, that novelty then often mostly disappears. A pidgin on this view is a froth of little connections forming and breaking between speakers of different languages. The pidgin-activity gives the connection that makes the more established languages / communities be parasystemic w.r.t. to each other rather than ectosystemic. But the pidgin-activity is unstable and so is harder to speak of as system or system-part on its own? I guess I'd call [the speakers's capacity to generate nonce forms on the fly] a stable part of the multi-language system?

Like, by "parasystemic novelty" I mean that if you have two systems S1 and S2, and each of them is fairly integrated within itself but they are only loosely integrated with each other, then S1 is parasystemic novelty (new stuff) from S2's perspective, and vice versa. The term sort of takes a single standpoint because I'm thinking of situations where you have a big mind that's been growing, and then you have a little bit of new stuff, and you want to describe the little bit of new stuff from the mind's perspective, so you say it's parasystemic novelty or whatever.

Comment by TsviBT on Wildfire of strategicness · 2023-06-26T19:26:21.268Z · LW · GW

That seems like a real thing, though I don't know exactly what it is. I don't think it's either unboundedly general or unboundedly ambitious, though. (To be clear, this is isn't very strongly a critique of anyone; general optimization is really hard, because it's asking you to explore a very rich space of channels, and acting with unbounded ambition is very fraught because of unilateralism and seeing like a state and creating conflict and so on.) Another example is: how many people have made a deep and empathetic exploration of why [people doing work that hastens AGI] are doing what they are doing? More than zero, I think, but very very few, and it's a fairly obvious thing to do--it's just weird and hard and requires not thinking in only a culturally-rationalist-y way and requires recursing a lot on difficulties (or so I suspect; I haven't done it either). I guess the overall point I'm trying to make here is that the phrase "wildfire of strategicness", taken at face value, does fit some of your examples; but also I'm wanting to point at another thing, which like "the ultimate wildfire of strategicness", where it doesn't "saw off the tree-limb that it climbed out on", like empires do by harming their subjects, or like social movements do by making their members unable to think for themselves.

What are you referring to with biological intelligence enhancement?

Well, anything that would have large effects. So, not any current nootropics AFAIK, but possibly hormones or other "turning a small key to activate a large/deep mechanism" things.

Comment by TsviBT on Wildfire of strategicness · 2023-06-26T19:15:28.710Z · LW · GW

This is maybe the most plausible one I've heard. There's also empires in general, but they're less plausible as examples--for one thing, I imagine they're pretty biased towards being a certain way (something like, being set up to channel and aggregrate violence) at the expense of achieving any particular goals.

Comment by TsviBT on Wildfire of strategicness · 2023-06-19T11:56:32.093Z · LW · GW

I don't think so, not usually. What happens after they join the EA club? My observations are more consistent with people optimizing (or sometimes performing to appear as though they're optimizing) through a fairly narrow set of channels. I mean, humans are in a weird liminal state, where we're just smart enough to have some vague idea that we ought to be able to learn to think better, but not smart and focused enough to get very far with learning to think better. More obviously, there's anti-interest in biological intelligence enhancement, rather than interest.

Comment by TsviBT on Wildfire of strategicness · 2023-06-12T15:28:42.226Z · LW · GW

Are you echoing this point from the post?

We can at least say that, if the totality of the mental elements surrounding the wildfire is going to notice and suppress the wildfire, it would have to think at least strategically enough to notice and close off all the sneaky ways by which the wildfire might wax. This implies that the surrounding mental elements do a lot of thinking and have a lot of understanding relevant to strategic takeovers, which itself seemingly makes more available the knowledge needed for strategic takeovers.

It might be possible for us humans to prevent strategicness, though this seems difficult because even detecting strategicness is maybe very difficult. E.g. because thinking about X also sneakily thinks about Y:

My mainline approach is to have controlled strategicness, ideally corrigible (in the sense of: the mind thinks that [the way it determines the future] is probably partially defective in an unknown way).

Comment by TsviBT on Wildfire of strategicness · 2023-06-12T15:23:27.960Z · LW · GW

Good point, though I think it's a non-fallacious enthymeme. Like, we're talking about a car that moves around under its own power, but somehow doesn't have parts that receive, store, transform, and release energy and could be removed? Could be. The mind could be an obscure mess where nothing is factored, so that a cancerous newcomer with read-write access can't get any work out of the mind other than through the top-level interface. I think that explicitness ( is a very strong general tendency (cline) in minds, but if that's not true then my first reason for believing the enthymeme's hidden premise is wrong.

Comment by TsviBT on Wildfire of strategicness · 2023-06-12T15:14:05.425Z · LW · GW

I feel like none of these historical precedents is a perfect match. It might be valuable to think about the ways in which they are similar and different.

To me a central difference, suggested by the word "strategic", is that the goal pursuit should be

  1. unboundedly general, and
  2. unboundedly ambitious.

By unboundedly ambitious I mean "has an unbounded ambit" (ambit = "the area went about in; the realm of wandering" ), i.e. its goals induce it to pursue unboundedly much control over the world.

By unboundedly general I mean that it's universal for optimization channels. For any given channel through which one could optimize, it can learn or recruit understanding to optimize through that channel.

Humans are in a weird liminal state where we have high-ambition-appropriate things (namely, curiosity), but local changes in pre-theoretic "ambition" (e.g. EA, communism) are usually high-ambition-inappropriate (e.g. divesting from basic science in order to invest in military power or whatever).

Comment by TsviBT on A strong mind continues its trajectory of creativity · 2023-05-30T16:55:28.996Z · LW · GW

I think it's a good comparison, though I do think they're importantly different. Evolution figured out how to make things that figure out how to figure stuff out. So you turn off evolution, and you still have an influx of new ability to figure stuff out, because you have a figure-stuff-out figure-outer. It's harder to get the human to just figure stuff out without also figuring out more about how to figure stuff out, which is my point.

Tsvi appears to take the fact that you can stop gradient-descent without stopping the main operation of the NN to be evidence that the whole setup isn't on a path to produce strong minds.

(I don't see why it appears that I'm thinking that.) Specialized to NNs, what I'm saying is more like: If/when NNs make strong minds, it will be because the training---the explicit-for-us, distal ex quo---found an NN that has its own internal figure-stuff-out figure-outer, and then the figure-stuff-out figure-outer did a lot of figuring out how to figure stuff out, so the NN ended up with a lot of ability to figure stuff out; but a big chunk of the leading edge of that ability to figure stuff out came from the NN's internal figure-stuff-out figure-outer, not "from the training"; so you can't turn off the NN's figure-stuff-out figure-outer just by pausing training. I'm not saying that the setup can't find an NN-internal figure-stuff-out figure-outer (though I would be surprised if that happens with the exact architectures I'm aware of currently existing).

Comment by TsviBT on A strong mind continues its trajectory of creativity · 2023-05-30T16:43:55.498Z · LW · GW

Yes, I think there's stuff that humans do that's crucial for what makes us smart, that we have to do in order to perform some language tasks, and that the LLM doesn't do when you ask it to do those tasks, even when it performs well in the local-behavior sense.

Comment by TsviBT on Distillation of Neurotech and Alignment Workshop January 2023 · 2023-05-30T16:23:14.634Z · LW · GW

Thanks! You've confirmed my fears about the butcher number.

Re/ other methods: I wonder if there are alternate write methods that can plausibly scale to >100,000s of neurons. The enhancements that seem most promising to me involve both reading and writing at massive scale.

Comment by TsviBT on Distillation of Neurotech and Alignment Workshop January 2023 · 2023-05-22T14:17:56.391Z · LW · GW

My guess is that most of the interesting stuff here is bottlenecked on the biotech that determines bandwidth. Most of the interesting stuff needs very many (>millions?) of precise connections, and that's hard to get safely with big clumsy electrodes. It would be very nice if someone could show that's wrong, or if someone could figure out how to get many connections faster than the default research.

Comment by TsviBT on The possible shared Craft of deliberate Lexicogenesis · 2023-05-22T14:13:28.833Z · LW · GW

Oh, I ended up (through "non-Newtonian") with the same word for a similar idea! (I can't find any substantial notes, just a message to myself saying "mind as oobleck"; I think I was thinking about something around how when you push against an idea, test it, examine it, the idea or [what the idea was supposed to be] is evoked more strongly and precisely.)

Comment by TsviBT on A strong mind continues its trajectory of creativity · 2023-05-19T20:10:05.771Z · LW · GW

Lol why is this post so controversial?

Comment by TsviBT on A strong mind continues its trajectory of creativity · 2023-05-19T20:09:44.291Z · LW · GW

If a mind comes to understand a bunch of stuff, there's probably some compact reasons that it came to understand a bunch of stuff. What could such reasons be? The mind might copy a bunch of understanding from other minds. But if the mind becomes much more capable than surrounding minds, that's not the reason, assuming that much greater capabilities required much more understanding. So it's some other reason. I'm describing this situation as the mind being on a trajectory of creativity.

Comment by TsviBT on Expanding the domain of discourse reveals structure already there but hidden · 2023-04-16T16:07:29.000Z · LW · GW

(I like this connection, thanks!)

Comment by TsviBT on Expanding the domain of discourse reveals structure already there but hidden · 2023-04-16T15:59:23.619Z · LW · GW

Oops, yeah they're disconnected. (I don't like orientation-reversing isometries.) Could amend to "the group contains such a group", or "C admits such a group". If we want to be really precise, C might also have to be stipulated to be nowhere constant.

Comment by TsviBT on Ultimate ends may be easily hidable behind convergent subgoals · 2023-04-09T13:41:18.290Z · LW · GW

Thanks, that's a great example!

Does such a strategy count as misalignment?

Yeah, I don't think it necessarily counts as misalignment. In fact, corrigibility probably looks behaviorally a lot like this: gathering ability to affect the world, without making irreversible decisions, and waiting for the overseer to direct how to cash out into ultimate effects. But the hidability means that "ultimate intents" or "deep intents" are conceptually murky, and therefore not obvious how to read off an agent--if you can discern them through behavior, what can you discern them through?

Comment by TsviBT on Descriptive vs. specifiable values · 2023-04-01T08:04:09.644Z · LW · GW

It's a reasonable idea. See here though:

Comment by TsviBT on Shell games · 2023-03-26T09:39:45.614Z · LW · GW

(Sorry, I didn't get this on two readings. I may or may not try again. Some places I got stuck:

Are you saying that by pretending really hard to be made of entirely harmless elements (despite actually behaving with large and hence possibly harmful effects), an AI is also therefore in effect trying to prevent all out-of-band effects of its components / mesa-optimizers / subagents / whatever? This still has the basic alignment problem: I don't know how to make the AI be very intently trying to X, including where X = pretending really hard that whatever.

Or are you rather saying (or maybe this is the same as / a subset of the above?) that the Mask is preventing potential agencies from coalescing / differentiating and empowering themselves with the AI system's capability-pieces, by literally hiding from the potential agencies and therefore blocking their ability to empower themselves?

Anyway, thanks for your thoughts.)

Comment by TsviBT on Shell games · 2023-03-26T09:18:48.142Z · LW · GW

That was one of the examples I had in mind with this post, yeah. (More precisely, I had in mind defenses of HCH being aligned that I heard from people who aren't Paul. I couldn't pass Paul's ITT about HCH or similar.)

Comment by TsviBT on Shell games · 2023-03-26T09:17:24.247Z · LW · GW

Yeah, I think that roughly lines up with my example of "generator of large effects". The reason I'd rather say "generator of large effects" rather than "trying" is that "large effects" sounds slightly more like something that ought to have a sort of conservation law, compared to "trying". But both our examples are incomplete in that the supposed conservation law (which provides the inquisitive force of "where exactly does your proposal deal with X, which it must deal with somewhere by conservation") isn't made clear.

Comment by TsviBT on We have to Upgrade · 2023-03-25T10:35:08.786Z · LW · GW

I don't really know but my guess is that all these schemes would have to involve high bandwidth (whether reading or writing or both), and bandwidth is very hard to achieve. The electrodes are unwieldy (IIUC it's a notable accomplishment of neuralink to get to ~1000), and we'd want, I don't know, at least 3 more orders of magnitude to see really interesting uses?