Posts

Koan: divining alien datastructures from RAM activations 2024-04-05T18:04:57.280Z
What could a policy banning AGI look like? 2024-03-13T14:19:07.783Z
A hermeneutic net for agency 2024-01-01T08:06:30.289Z
What is wisdom? 2023-11-14T02:13:49.681Z
Human wanting 2023-10-24T01:05:39.374Z
Hints about where values come from 2023-10-18T00:07:58.051Z
Time is homogeneous sequentially-composable determination 2023-10-08T14:58:15.913Z
Telopheme, telophore, and telotect 2023-09-17T16:24:03.365Z
Sum-threshold attacks 2023-09-08T17:13:37.044Z
Fundamental question: What determines a mind's effects? 2023-09-03T17:15:41.814Z
Views on when AGI comes and on strategy to reduce existential risk 2023-07-08T09:00:19.735Z
The fraught voyage of aligned novelty 2023-06-26T19:10:42.195Z
Provisionality 2023-06-19T11:49:06.680Z
Explicitness 2023-06-12T15:05:04.962Z
Wildfire of strategicness 2023-06-05T13:59:17.316Z
The possible shared Craft of deliberate Lexicogenesis 2023-05-20T05:56:41.829Z
A strong mind continues its trajectory of creativity 2023-05-14T17:24:00.337Z
Better debates 2023-05-10T19:34:29.148Z
An anthropomorphic AI dilemma 2023-05-07T12:44:48.449Z
The voyage of novelty 2023-04-30T12:52:16.817Z
Endo-, Dia-, Para-, and Ecto-systemic novelty 2023-04-23T12:25:12.782Z
Possibilizing vs. actualizing 2023-04-16T15:55:40.330Z
Expanding the domain of discourse reveals structure already there but hidden 2023-04-09T13:36:28.566Z
Ultimate ends may be easily hidable behind convergent subgoals 2023-04-02T14:51:23.245Z
New Alignment Research Agenda: Massive Multiplayer Organism Oversight 2023-04-01T08:02:13.474Z
Descriptive vs. specifiable values 2023-03-26T09:10:56.334Z
Shell games 2023-03-19T10:43:44.184Z
Are there cognitive realms? 2023-03-12T19:28:52.935Z
Do humans derive values from fictitious imputed coherence? 2023-03-05T15:23:04.065Z
Counting-down vs. counting-up coherence 2023-02-27T14:59:39.041Z
Does novel understanding imply novel agency / values? 2023-02-19T14:41:40.115Z
Please don't throw your mind away 2023-02-15T21:41:05.988Z
The conceptual Doppelgänger problem 2023-02-12T17:23:56.278Z
Control 2023-02-05T16:16:41.015Z
Structure, creativity, and novelty 2023-01-29T14:30:19.459Z
Gemini modeling 2023-01-22T14:28:20.671Z
Non-directed conceptual founding 2023-01-15T14:56:36.940Z
Dangers of deference 2023-01-08T14:36:33.454Z
The Thingness of Things 2023-01-01T22:19:08.026Z
[link] The Lion and the Worm 2022-05-16T20:40:22.659Z
Harms and possibilities of schooling 2022-02-22T07:48:09.542Z
Rituals and symbolism 2022-02-10T16:00:14.635Z
Index of some decision theory posts 2017-03-08T22:30:05.000Z
Open problem: thin logical priors 2017-01-11T20:00:08.000Z
Training Garrabrant inductors to predict counterfactuals 2016-10-27T02:41:49.000Z
Desiderata for decision theory 2016-10-27T02:10:48.000Z
Failures of throttling logical information 2016-02-24T22:05:51.000Z
Speculations on information under logical uncertainty 2016-02-24T21:58:57.000Z
Existence of distributions that are expectation-reflective and know it 2015-12-10T07:35:57.000Z
A limit-computable, self-reflective distribution 2015-11-15T21:43:59.000Z

Comments

Comment by TsviBT on The Shutdown Problem: Incomplete Preferences as a Solution · 2024-04-02T19:38:37.124Z · LW · GW

IDK if this is a crux for me thinking this is very relevant to stuff on my perspective, but:

The training procedure you propose doesn't seem to actually incentivize indifference. First, a toy model where I agree it does incentivize that:

On the first time step, the agent gets a choice: choose a number 1--N. If the agent says k, then the agent has nothing at all to do for the first k steps, after which some game G starts. (Each play of G is i.i.d., not related to k.)

So this agent is indeed incentivized to pick k uniformly at random from 1--N. Now consider:

The agent is in a rich world. There are many complex multi-step plans to incentivize agent to learn problem-solving. Each episode, at time N, the agent gets to choose: end now, or play 10 more steps.

Does this incentivize random choice at time N? No. It incentivizes the agent to choose randomly End or Continue at the very beginning of the episode, and then carefully plan and execute behavior that acheives the most reward assuming a run of length N or N+10 respectively.

Wait, but isn't this success? Didn't we make the agent have no trajectory length preference?

No. Suppose:

Same as before, but now there's a little guy standing by the End/Continue button. Sometimes he likes to press button randomly.

Do we kill the guy? Yes we certainly do, he will mess up our careful plans.

Comment by TsviBT on [deleted post] 2024-03-24T14:16:57.983Z

I think it's a norm if you're bidding for specific attention, yeah. Like, you should either do more work to figure out a smaller set of people who you want to bid for attention from, or else at least say that you haven't done that work.

Comment by TsviBT on [deleted post] 2024-03-24T14:15:19.597Z

The examples in this comment are about "oops I had an idea that sounds good but is accidentally bad". That's a reasonable thing to worry about but doesn't seem like the thing you were actually asking about. You wrote:

I don't expect to be particularly good at coordinating with my perfect clones for example. I'm sure if you put me in a room with my perfect clone and a source of massive power (such as a controllable ASI), we'd beat each other half to death fighting for it.

This seems much more central, and indicates a major problem.

Comment by TsviBT on [deleted post] 2024-03-24T06:56:10.093Z

Is this referring to my insights in particular or something similar somebody else said?

It's meant to gesture at a category of thinking, a given instance of which may or may not be worthwhile or interesting, but which leads people to be very overly worried about the consequences of spreading the ideas involved, compared to how bad the consequences actually are. For example, sometimes [people who take hypothetical possibilities very seriously] newly think of something, such as the potential of BCIs or the potential of thinking in such-and-such unconventional way or whatever. Then they implicitly reason like this: There's a bunch of potential here; previously I hadn't thought of this idea; previously I hadn't pursued efforts related to this idea; now I've thought of this idea; the fact that I just now thought of the idea and hadn't previously explains away the fact that I haven't previously pursued related efforts; so probably my straightforward inside view of why there's potential here is correct or at least a good rough draft guess; which means there are huge implications here; and the reason others aren't pursuing related efforts is probably that they didn't think of the idea; and since the idea is powerful, I shouldn't share it.

Usually some but not all of these inferences are correct. Often the neglectedness is mainly because others don't believe in hypothetical possibilities, not because no one has thought of it. Rarely does the final inference go through.

I’ve already had conversations with multiple billionaires.

I would think the problem here would be failing at transfering the relevant info, not transfering too much info!

But if you manage to get their attention you could get them to copy your preferred choices instead.

The only morally acceptable thing to copy in this way is an orientation against making decision this way.

Comment by TsviBT on [deleted post] 2024-03-24T04:32:50.817Z

For the most part, if you have a reason to share some information, you should share it. For the most part, trying to make a bunch of information boundaries will cripple your ability to do anything useful, and doesn't avert much bad stuff. Your amazing strategic insights about how we're all swimming in a sea of hyperstitious memetic warfare and therefore we can control the future by blah blah are usually false, and not actually that big if true because in general things are more in equilibrium than they seem and more driven by forces you're not controlling than they seem. The more open I am about things I thought I should be cagey about, the more I find no one cares. Unless you've got a lot of attention for some reason, roughly no one cares about what you think enough to do much of anything in response to what you think.

There are obvious exceptions, like not sharing other people's personal info in public or not sharing your garage nuke technology.

Distinguish [trust to not harm you, e.g. by misusing info you've shared] from [trust to meet your efforts toward a shared goal]. The latter is generally more important than the former, because lifeforce is a pretty limited resource, so you have to know where to invest yours.

Comment by TsviBT on [deleted post] 2024-03-24T04:20:23.266Z

I think it's pretty defecty to email a lot of people without saying in the email that you've done so.

Comment by TsviBT on Toward a Broader Conception of Adverse Selection · 2024-03-15T01:49:29.789Z · LW · GW

Bad restaurants are more likely to have open tables than good restaurants.

That seems dependent on it being difficult to scale the specific skill that went into putting together the experience at the good restaurant. Things that are more scalable, like small consumer products, can be selected to be especially good trades (the bad ones don't get popular and inexpensive).

Comment by TsviBT on Toward a Broader Conception of Adverse Selection · 2024-03-15T00:32:13.289Z · LW · GW

Bruh. Banana Laffy Taffy is the best. Happy to trade away non-banana to receive banana, 1:1.

Comment by TsviBT on What Software Should Exist? · 2024-01-20T03:59:00.569Z · LW · GW

The point of the essay is to describe the context that would make one want a hyperphone, so that

  1. one can be motivated by the possibility of a hyperphone, and

  2. one could get a hold of the criteria that would direct developing a good hyperphone.

The phrase "the ability to branch in conversations" doesn't do either of those.

Comment by TsviBT on What Software Should Exist? · 2024-01-20T00:07:05.699Z · LW · GW

Quoting another comment I made:

Make a hyperphone. A majority of my alignment research conversations would be enhanced by having a hyperphone, to a degree somewhere between a lot and extremely; and this is heavily weighted on the most hopeworthy conversations. (Also sometimes when I explain what a hyperphone is well enough for the other person to get it, and then we have a complex conversation, they agree that it would be good. But very small N, like 3 to 5.)

https://tsvibt.blogspot.com/2023/01/hyperphone.html

Comment by TsviBT on Views on when AGI comes and on strategy to reduce existential risk · 2024-01-18T23:00:36.625Z · LW · GW

Yes.

Comment by TsviBT on A hermeneutic net for agency · 2024-01-02T15:37:46.403Z · LW · GW

It's a makeshift stop-gradient. I less feel like I'm writing to LessWrong if I'm not publishing it immediately, and although LW is sadly the best place on the internet that I'm aware of, it's very much not in aggregate a gradient I want. Sometimes I write posts intended for LW and publish them immediately.

Comment by TsviBT on The 'Neglected Approaches' Approach: AE Studio's Alignment Agenda · 2023-12-18T23:42:04.578Z · LW · GW

Make a hyperphone. A majority of my alignment research conversations would be enhanced by having a hyperphone, to a degree somewhere between a lot and extremely; and this is heavily weighted on the most hopeworthy conversations. (Also sometimes when I explain what a hyperphone is well enough for the other person to get it, and then we have a complex conversation, they agree that it would be good. But very small N, like 3 to 5.)

https://tsvibt.blogspot.com/2023/01/hyperphone.html

Comment by TsviBT on Non-directed conceptual founding · 2023-12-18T20:55:29.034Z · LW · GW

I'm not sure I understand your question at all, sorry. I'll say my interpretation and then answer that. You might be asking:

Is the point of the essay summed up by saying: " "Thing=Nexus" is not mechanistic/physicalist, but it's still useful; in general, explanations can be non-mechanistic etc., but still be useful, perhaps by giving a functional definition of something."?

My answer is no, that doesn't sum up the essay. The essay makes these claims:

  1. There many different directions in conceptspace that could be considered "more foundational", each with their own usefulness and partial coherence.
  2. None of these directions gives a total ordering that satisfies all the main needs of a "foundational direction".
  3. Some propositions/concepts not only fail to go in [your favorite foundational direction], but are furthermore circular; they call on themselves.
  4. At least for all the "foundational directions" I listed, circular ideas can't be going in that direction, because they are circular.
  5. Nevertheless, a circular idea can be pretty useful.

I did fail to list "functional" in my list of "foundational directions", so thanks for bringing it up. What I say about foundational directions would also apply to "functional".

Comment by TsviBT on What is wisdom? · 2023-12-18T20:39:41.671Z · LW · GW

Hm, ok, thanks. I don't I fully understand+believe your claims. For one thing, I would guess that many people do think and act, under the title "Buddhism", as if they believe that desire is the cause of suffering.

If I instead said "Clinging/Striving is the cause of [painful wheel-spinning in pursuit of something missing]", is that any closer? (This doesn't really fit what I'm seeing in the Wiki pages.) I would also say that decompiling clinging/striving in order to avoid [painful wheel-spinning in pursuit of something missing] is tantamount to nihilism. (But maybe to learn what you're offering I'd have to do more than just glance at the Wiki pages.)

Comment by TsviBT on The Shortest Path Between Scylla and Charybdis · 2023-12-18T20:30:10.916Z · LW · GW

As you can see, the failures lie on a spectrum, and they're model-dependent to boot.

And we can go further and say that the failures lie in a high-dimensional space, and that the apparent tradeoff is more a matter of finding the directions in which to pull the rope sideways. Propagating constraints between concepts and propositions is a way to go that seems hopeworthy to me. One wants to notice commonalities in how each of one's plans are doomed, and then address the common blockers / missing ideas. In other words, recurse to the "abstract" as much as is called for, even if you get really abstract; but treat [abstracting more than what you can directly see/feel as being demanded by your thinking] as a risky investment with opportunity cost.

Comment by TsviBT on Current AIs Provide Nearly No Data Relevant to AGI Alignment · 2023-12-18T20:25:10.235Z · LW · GW

Thanks for writing this and engaging in the comments. "Humans/humanity offer the only real GI data, so far" is a basic piece of my worldview and it's nice to have a reference post explaining something like that.

Comment by TsviBT on Does davidad's uploading moonshot work? · 2023-11-03T06:14:16.544Z · LW · GW

all the cognitive information processing

 

I don't understand what's being claimed here, and feel the urge to get off the boat at this point without knowing more. Most stuff we care about isn't about 3-second reactions, but about >5 minute reactions. Those require thinking, and maybe require non-electrical changes--synaptic plasticity, as you mention. If they do require non-electrical changes, then this reasoning doesn't go through, right? If we make a thing that simulates the electrical circuitry but doesn't simulate synaptic plasticity, we'd expect to get... I don't know, maybe a thing that can perform tasks that are already "compiled into low-level code", so to speak, but not tasks that require thinking? Is the claim that thinking doesn't require such changes, or that some thinking doesn't require such changes, and that subset of thinking is enough for greatly decreasing X-risk?

Comment by TsviBT on Hints about where values come from · 2023-11-02T01:34:40.955Z · LW · GW

If we were to break down where a value comes from it would have to be from some combination of these basic drives, cortical tendencies (e.g. vulnerability to optical illusions), and learned behavior.

I wouldn't want to say this is false, but I'd want to say that speaking like this is a red flag that we haven't understood what values are in the appropriate basis. We can name some dimensions (the ones you list, and others), but then our values are rotated with respect to this basis; our values are some vector that cuts across these basis vectors. We lack the relevant concepts. When you say that you experience "the underlying drivers behind your goals" as being constant, I'm skeptical, not because I don't think there's something that's fairly fixed, but because we lack the concepts to describe that fixed thing, and so it's hard to see how you could have a clear experience of the fixedness. At most you could have a vague sense that perhaps there is something fixed. And if so, then I'd want to take that sense as a pointer toward the as-yet not understood ideas.

Comment by TsviBT on Sum-threshold attacks · 2023-11-01T20:59:21.653Z · LW · GW

Yeah that sounds like an example!

Comment by TsviBT on Sum-threshold attacks · 2023-11-01T18:42:25.868Z · LW · GW

Maybe? It's a bit weird because that situation would involve some non-unified agency, which we don't understand super well. Like, you'd have [you_1, who decides to enforce various small habits] and separately [you_2, who is made of unconscious habits], and you_1 is supposed to be sending influences to you_2 in a way that is below some threshold--what's the threshold? Is it that you_2 will resist / not be dragged along with large changes, but can be forced into small ones? And then, there's supposed to be some large aggregate effect. What is that? Is it just a bunch of small habits, or is the point that something else changes? Is that large change supposed to be in you_2?

Haven't seen that movie, maybe I will later.

Comment by TsviBT on Hints about where values come from · 2023-11-01T18:37:45.188Z · LW · GW

the underlying drivers behind my goals seem fairly constant throughout my life

What are these specifically, and what type of thing are they? Were they there when you were born? Were they there "implicitly" but not "explicitly"? In what sense were they always there (since whenever you claim they started being there)?

Surely your instrumental goals change, and this is fine and is a result of learning, as you say. So when something changes, you say: Ah, this wasn't my values, this was instrumental goals. But how can you tell that there's something fixed that underlies or overarches all the changing stuff? What is it made of?

Comment by TsviBT on Views on when AGI comes and on strategy to reduce existential risk · 2023-10-24T01:31:24.120Z · LW · GW

I roughly agree. As I mentioned to Adele, I think you could get sort of lame edge cases where the LLM kinda helped find a new concept. The thing that would make me think the end is substantially nigher is if you get a model that's making new concepts of comparable quality at a comparable rate to a human scientist in a domain in need of concepts.

if you nail some Chris Olah style transparency work

Yeah that seems right. I'm not sure what you mean by "about language". Sorta plausibly you could learn a little something new about some non-language domain that the LLM has seen a bunch of data about, if you got interpretability going pretty well. In other words, I would guess that LLMs already do lots of interesting compression in a different way than humans do it, and maybe you could extract some of that. My quasi-prediction would be that those concepts

  1. are created using way more data than humans use for many of their important concepts; and
  2. are weirdly flat, and aren't suitable out of the box for a big swath of the things that human concepts are suitable for.
Comment by TsviBT on Sum-threshold attacks · 2023-10-24T01:23:52.342Z · LW · GW

I think you're right that the human process of science is vulnerable this way.

Comment by TsviBT on Sum-threshold attacks · 2023-10-24T01:21:34.192Z · LW · GW

I think close companions can sometimes see sum-threshold attacks that the target can't see, but some attacks go unnoticed by anyone for a long while. I think poorly-resourced agents can carry out these attacks.

Comment by TsviBT on Sum-threshold attacks · 2023-10-24T01:18:48.421Z · LW · GW

Excellent, thanks!

Comment by TsviBT on Sum-threshold attacks · 2023-10-24T01:16:31.583Z · LW · GW

Nice, yeah. This seems like centrally salami slicing.

Comment by TsviBT on Sum-threshold attacks · 2023-10-24T01:15:46.602Z · LW · GW

What do you mean by "the primary factor"? The primary factor in what? I think it's true that in many cases, microagressions don't matter in comparison to macroagressions. E.g. when people make Jew jokes, I'm a little on edge not because of the joke itself, but because the joke being told is a bit of evidence (and maybe a part of a strategy) that some people are trying to gain common knowledge of a shared goal of physically attacking Jews. On the other hand I think that in some contexts, the microaggressions together are an attack on their own, even without the threats.

Comment by TsviBT on Hints about where values come from · 2023-10-24T01:11:52.838Z · LW · GW

And it seems arthrodiatomic (cutting across joints, i.e. non-joint-carving) to describe the envelope-extension process itself as being an instance of homeostasis.

Comment by TsviBT on Hints about where values come from · 2023-10-24T01:10:24.041Z · LW · GW

This is a non-answer, and I wish you'd notice on your own that it's a non-answer. From the dialogue:

Really I want to know the shape of values as they sit in a mind. I want to know that because I want to make a mind that has weird-shaped values. Namely, Corrigibility.

So, given that you know where values come from, do you know what it looks like to have a deeply corrigible strong mind, clearly enough to make one? I don't think so, but please correct me if you do. Assuming you don't, I suggest that understanding what values are and where they come from in a more joint-carving way might help.

In other words, saying that, besides some details, values come as "the result of how we're "wired" up into feedback loops" is true enough, but not an answer. It would be like saying "our plans are the result of how our neurons fire" or "the Linux operating system is the result of how electrons move through the wires in my computer". It's not false, it's just not an answer to the question we were asking.

Comment by TsviBT on Telopheme, telophore, and telotect · 2023-10-15T18:50:25.038Z · LW · GW

Ok yeah I agree with this. Related: https://tsvibt.blogspot.com/2023/09/the-cosmopolitan-leviathan-enthymeme.html#pointing-at-reality-through-novelty

And an excerpt from a work in progress:

Example: Blueberries

For example, I reach out and pick up some blueberries. This is some kind of expression of my values, but how so? Where are the values?

Are the values in my hands? Are they entirely in my hands, or not at all in my hands? The circuits that control my hands do what they do with regard to blueberries by virtue of my hands being the way they are. If my hands were different, e.g. really small or polydactylous, my hand-controller circuits would be different and would behave differently when getting blueberries. And the deeper circuits that coordinate visual recognition of blueberries, and the deeper circuits that coordinate the whole blueberry-getting system and correct errors based on blueberrywise success or failure, would also be different. Are the values in my visual cortext? The deeper circuits require some interface with my visual cortex, to do blueberry find-and-pick-upping. And having served that role, my visual cortex is specially trained for that task, and it will even promote blueberries in my visual field to my attention more readily than yours will to you. And my spatial memory has a nearest-blueberries slot, like those people who always know which direction is north.

It may be objected that the proximal hand-controllers and the blueberry visual circuits are downstream of other deeper circuits, and since they are downstream, they can be excluded from constituting the value. But that's not so clear. To like blueberries, I have to know what blueberries are, and to know what blueberries are I have to interact with them. The fact that I value blueberries relies on me being able to refer to blueberries. Certainly, if my hands were different but comparably versatile, then I would learn to use them to refer to blueberries about as well as my real hands do. But the reference to (and hence the value of) blueberries must pass through something playing the role that hands play. The hands, or something else, must play that role in constituting the fact that I value blueberries.

The concrete is never lost

In general, values are founded on reference. The context that makes a value a value has to provide reference.

The situation is like how an abstract concept, once gained, doesn't overwrite and obselete what was abstracted from. Maxwell's equations don't annihilate Faraday's experiments in their detail. The experiments are unified in idea--metaphorically, the field structures are a "cross-section" of the messy detailed structure of any given experiment. The abstract concepts, to say something about a specific concrete experimental situation, have to be paired with specific concrete calculations and referential connections. The concrete situations are still there, even if we now, with our new abstract concepts, want to describe them differently.

Is reference essentially diasystemic?

If so, then values are essentially diasystemic.

Reference goes through unfolding.

To refer to something in reality is to be brought (or rather, bringable) to the thing. To be brought to a thing is to go to where the thing really is, through whatever medium is between the mind and where the thing really is. The "really is" calls on future novelty. See "pointing at reality through novelty".

In other words, reference is open--maybe radically open. It's supposed to incorporate whatever novelty the mind encounters--maybe deeply.

An open element can't be strongly endosystemic.

An open element will potentially relate to (radical, diasystemic) novelty, so its way of relating to other elements can't be fully stereotyped by preexisting elements with their preexisting manifest relations.

Comment by TsviBT on Telopheme, telophore, and telotect · 2023-10-08T15:13:06.081Z · LW · GW

It's definitely like symbol grounding, though symbol grounding is usually IIUC about "giving meaning to symbols", which I think has the emphasis on epistemic signifying?

Comment by TsviBT on Sum-threshold attacks · 2023-10-08T15:10:36.321Z · LW · GW

True. But often the target can't do that test, e.g. because it's costly or because they don't actually know what to look for. Also, the "threshold" is sometimes not about the target, but about a third party, e.g. a another person who's supposed to judge whether the attacked is really being attacked. Verbal abuse is an example of both: the abused often doesn't have concepts to describe what's happening, and so doesn't know what to look for and doesn't know what to say to a judge; and because the abuse comes along with pain and distraction, it's costly to track the sum; and there's noise and ambiguity, so the judge doesn't credit any one instance; and the judge may not accept a description of the sum, but only accepts an accounting of each instance, which imposes sum-sized costs on reporting a sum-sized attack.

Comment by TsviBT on Telopheme, telophore, and telotect · 2023-10-08T15:04:10.636Z · LW · GW

I think that your question points out how the concepts as I've laid them out don't really work. I now think that values such as liking a certain process or liking mental properties should be treated as first-class values, and this pretty firmly blurs the telopheme / telophore distinction.

Comment by TsviBT on Sum-threshold attacks · 2023-10-08T15:01:36.116Z · LW · GW

the optimal conlang isn't a new set of words. it's a new set of practices for naming things, unnaming things, generalising & specialising, communal decision-processes for resolving conflicts, neat meta-structures that minimise cost of refactoring (somehow), enabling eager contributors w minimal overhead & risk of degeneration, etc.

Absolutely.

Comment by TsviBT on Telopheme, telophore, and telotect · 2023-09-19T16:20:56.114Z · LW · GW

Ok, here: https://docs.google.com/spreadsheets/d/1V1QERPIKzpZNtS10hTwfe9aDyfmYWU8QbftBo9jIi9I/edit#gid=0

It's just what's shown in the screenshot though.

Comment by TsviBT on Sum-threshold attacks · 2023-09-19T16:16:48.064Z · LW · GW

I think I have a couple other specific considerations:

  1. By getting ahold of the structure better, the structure can be better analyzed on its own terms. Drawing out implications, resolving inconsistencies, refactoring, finding non-obvious structural analogies or examples that I wouldn't find by ever actually being in the situation randomly.
  2. By getting ahold of the structure better, the structure can be better used in the abstract within other thinking that wants to think in related regions ("at a similar level of abstraction").
  3. Values (goal-pursuits, etc.) tend to want to flow through elements in all regions; they aren't just about the phenomenal presentation of situations. So I want to understand and name the real structure, so values can flow through the real structure more easily.

And a general consideration, which is like: I don't have good reason to think I see all the sorts of considerations going into good words / concepts / language, and I've previously thought I had understood much of the reasons only to then discover further important ones. Therefore I should treat as Not Yet Replaceable the sense I have of "naming the core structure", like how you want to write "elegant" code even without a specific reason. I want to step further into the inner regions of the Thing(s) at hand.

Comment by TsviBT on How to talk about reasons why AGI might not be near? · 2023-09-17T16:33:15.310Z · LW · GW

IME a lot of people's stated reasons for thinking AGI is near involve mistaken reasoning and those mistakes can be discussed without revealing capabilities ideas: https://www.lesswrong.com/posts/sTDfraZab47KiRMmT/views-on-when-agi-comes-and-on-strategy-to-reduce

Comment by TsviBT on Sum-threshold attacks · 2023-09-13T12:38:08.913Z · LW · GW

Interesting. I think I have a different approach, which is closer to

Find the true name of the thing--a word that makes the situation more understandable, more recognizable, by clarifying the core structure of the thing.

True name doesn't necessary mean a literal description of the core structure of the thing, though "sum-threshold" is such a literal description. "Anastomosis / anabranching (attack)" is metaphorical, but the point is, it's a metaphor for the core structure of the thing.

Comment by TsviBT on Sum-threshold attacks · 2023-09-13T12:32:12.735Z · LW · GW

Nice, thanks.

Comment by TsviBT on Sum-threshold attacks · 2023-09-13T12:29:22.982Z · LW · GW

This reminds me of these two Derren Brown videos: https://www.youtube.com/watch?v=43Mw-f6vIbo https://www.youtube.com/watch?v=sEmCQzueyEQ

I assume (but don't know for sure) that what's happening in the videos isn't as they appear (e.g. forging handwriting isn't that hard), but it's at least an interesting fictional example of a somewhat-additive attack like this.

Comment by TsviBT on Sum-threshold attacks · 2023-09-13T12:17:39.576Z · LW · GW

Yeah, this is why I didn't include steganography. (I don't know whether adversarial images are more like steganography / a message or more like a sum-threshold attack. )

Comment by TsviBT on Sum-threshold attacks · 2023-09-09T11:47:13.943Z · LW · GW

Thanks! That does seem at least pretty close. Wiki says:

Salami slicing tactics, also known as salami slicing, salami tactics, the salami-slice strategy, or salami attacks,[1] is the practice of using a series of many small actions to produce a much larger action or result that would be difficult or unlawful to perform all at once.

This is a pretty close match. But then, both the metaphor and many of the examples seem specifically about cutting up a big thing into little things--slicing the salami, slicing a big pile of money, slicing some territory. Some other examples have a frogboiling flavor: using acclimation to gradually advance (the kid getting into the water, China increasing presence in the sea), violating a boundary. (The science publishing examples seems like milking, not salami slicing / sum-threshold.) A "pure" sum-threshold attack doesn't have to look like either of those. E.g. a DDoS attack has the anastomosis structure without having a concrete thing that's being cut up and taken or a slipperly slope that's being pushed along; peer pressure often involves slippery slopes / frogboiling, but can also be "pure" in this sense, if it's a binary decision that's being pressured.

Comment by TsviBT on Sum-threshold attacks · 2023-09-09T11:35:22.732Z · LW · GW

Thanks, I didn't know the frog thing wasn't true.

I'm confused by your claim that the other examples aren't real... That seems so obviously false that maybe I misunderstand.

The examples:

  1. The vector thing. I take it you're not disputing the math, but saying the math doesn't describe a situation that happens in life?
  2. Verbal abuse. This one totally happens. Happened to me, happened to lots of other people. There's lots of books that describe what this looks like. 2.5. General social pressure. Don't people get social pressured into actions, roles, and opinions via shallowbroad channels all the time without being aware it's happening and without being able to say when or how it happened?
  3. DDoS. I assume this one happens, I've heard people discuss it happening and it's got a wiki page and everything. Are you saying there aren't DDoS attacks? Or are you saying that the person being DDoSed is aware that they are being DDoSed and aware of each user request? I agree with that; in this case the threshold isn't "did they notice", it's more like "is this particular user unambiguously part of the attack, such that it makes sense to ban them or sue them". Regardless of that, it has the underlying anastomosis structure.
  4. Systemic oppression. Are you claiming this isn't a thing that happens? To get a sense for what it's like, you could look at for example Alice's Adventures in Numberland which details a bunch of examples--subtle and not--of sexism in academia, experienced by the number theorist Alice Silverberg. Maybe you're saying it doesn't count because there's no agent?
  5. Adversarial image attacks. Are you saying the claims in the paper aren't true, or are you saying it's not an example of a sum-threshold attack because the perturbation is fragile / the coordinates depend on each other for their effect (plausible to me, but also plausibly it is), or for some other reason (what reason)?
Comment by TsviBT on Fundamental question: What determines a mind's effects? · 2023-09-07T23:48:24.738Z · LW · GW

I don't really like the block-universe thing in this context. Here "reversible" refers to a time-course that doesn't particularly have to be physical causality; it's whatever course of sequential determination is relevant. E.g., don't cut yourself off from acausal trades.

I think "reversible" definitely needs more explication, but until proven otherwise I think it should be taken on faith that the obvious intuition has something behind it.

Comment by TsviBT on The possible shared Craft of deliberate Lexicogenesis · 2023-09-07T23:30:18.369Z · LW · GW

It looks like having internal monologue is a spectrum, perhaps related to spectra of aphantasia

IDK about people who claim this. I'd want to look at what kinds of tasks / what kinds of thinking they are doing. For example, it makes sense to me for someone to "think with their body", e.g. figuring out how to climb up some object by sort of letting the motor coping skill play itself out. It's harder to imagine, say, doing physics without doing something that's very bound up with words. For reference, solving a geometric problem by visualizing things would probably still qualify, because the visualization and the candidate-solution-generator are probably structure by concepts that you only had because you had words.

optimized for covering a sufficiently diverse range of parameters of the aminoacid-space.

Interesting. Didn't know about that. That reminds me of phonemes.

Additional persons

Oh cool. Yeah, lojban might.

(Partially) parametrized concepts?

Neh. I mean to ask for a word for [a word that one person has used in two different ways--not because they are using the word totally inconsistently, using it in two different ways in the same context, but because they are using the word differently in different contexts--but in some sense they "ought" to either use the word in "the same way" in both contexts, or else use two different words; they are confusing themselves, acting as though they think that they are using the word in the same way across different contexts]. (This requires some analogy / relation between the two contexts, or else there's no way to say when someone uses a word "the same way".)

Overall, I'm slightly surprised by no mention of dath ilan, as they seem to have invested quite a lot of labor-hours into optimizing language, including in some of the directions you sketch out in this post.

All I've read about dath ilan is the thing about moving houses around on wires. Where is it described what they do with language?

Comment by TsviBT on Fundamental question: What determines a mind's effects? · 2023-09-07T23:15:12.686Z · LW · GW

It's definitely consciously meditative. It's a form of meditation I call "redescription". You redescribe the thing over and over--emphasizing different aspects, holding different central examples in mind, maybe tabooing words you used previously--like running your hands over an object over and over, making it familiar / part of you.

IDK about koans. A favorite intro / hook / source?

Comment by TsviBT on Gemini modeling · 2023-09-03T16:22:41.280Z · LW · GW

Basically, yeah.

A maybe trivial note: You switched the notation; I used Xp to mean "a part of the whole thing" and X is "the whole thing, the whole context of Xp", and then [Xp] to denote the model / twin of Xp. X would be all of B, or enough of B to make Xp the sort of thing that Xp is.

A less trivial note: It's a bit of a subtle point (I mean, a point I don't fully understand), but: I think it's important that it's not just "the relevant connections are reflected by analogous connections". (I mean, "relevant" is ambiguous and could mean what gemini modeling is supposed to me.) But anyway, the point is that to be gemini modeling, the criterion isn't about reflecting any specific connections. Instead the criterion is providing connections enough so that the gemini model [Xp] is rendered "the same sort of thing" as what's being gemini modeled Xp. E.g., if Xp is a belief that B has, then [Xp] as an element of A has to be treated by A in a way that makes [Xp] play the role of a belief in A. And further, the Thing that Xp in B "wants to be"--what it would unfold into, in B, if B were to investigate Xp further--is supposed to also be the same Thing that [Xp] in A would unfold into in A if A were to investigate [Xp] further. In other words, A is supposed to provide the context for [Xp] that makes [Xp] be "the same pointer" as Xp is for B.

Comment by TsviBT on Please don't throw your mind away · 2023-09-03T16:07:02.141Z · LW · GW

Yep, that turns out to be the case! Jason Gross also pointed this out to me. I didn't know it when I wrote that, so I guess it's a good example at least from my perspective.

Comment by TsviBT on Views on when AGI comes and on strategy to reduce existential risk · 2023-07-10T22:55:48.253Z · LW · GW

Not what I mean by analogies.