Posts

TsviBT's Shortform 2024-06-16T23:22:54.134Z
Koan: divining alien datastructures from RAM activations 2024-04-05T18:04:57.280Z
What could a policy banning AGI look like? 2024-03-13T14:19:07.783Z
A hermeneutic net for agency 2024-01-01T08:06:30.289Z
What is wisdom? 2023-11-14T02:13:49.681Z
Human wanting 2023-10-24T01:05:39.374Z
Hints about where values come from 2023-10-18T00:07:58.051Z
Time is homogeneous sequentially-composable determination 2023-10-08T14:58:15.913Z
Telopheme, telophore, and telotect 2023-09-17T16:24:03.365Z
Sum-threshold attacks 2023-09-08T17:13:37.044Z
Fundamental question: What determines a mind's effects? 2023-09-03T17:15:41.814Z
Views on when AGI comes and on strategy to reduce existential risk 2023-07-08T09:00:19.735Z
The fraught voyage of aligned novelty 2023-06-26T19:10:42.195Z
Provisionality 2023-06-19T11:49:06.680Z
Explicitness 2023-06-12T15:05:04.962Z
Wildfire of strategicness 2023-06-05T13:59:17.316Z
The possible shared Craft of deliberate Lexicogenesis 2023-05-20T05:56:41.829Z
A strong mind continues its trajectory of creativity 2023-05-14T17:24:00.337Z
Better debates 2023-05-10T19:34:29.148Z
An anthropomorphic AI dilemma 2023-05-07T12:44:48.449Z
The voyage of novelty 2023-04-30T12:52:16.817Z
Endo-, Dia-, Para-, and Ecto-systemic novelty 2023-04-23T12:25:12.782Z
Possibilizing vs. actualizing 2023-04-16T15:55:40.330Z
Expanding the domain of discourse reveals structure already there but hidden 2023-04-09T13:36:28.566Z
Ultimate ends may be easily hidable behind convergent subgoals 2023-04-02T14:51:23.245Z
New Alignment Research Agenda: Massive Multiplayer Organism Oversight 2023-04-01T08:02:13.474Z
Descriptive vs. specifiable values 2023-03-26T09:10:56.334Z
Shell games 2023-03-19T10:43:44.184Z
Are there cognitive realms? 2023-03-12T19:28:52.935Z
Do humans derive values from fictitious imputed coherence? 2023-03-05T15:23:04.065Z
Counting-down vs. counting-up coherence 2023-02-27T14:59:39.041Z
Does novel understanding imply novel agency / values? 2023-02-19T14:41:40.115Z
Please don't throw your mind away 2023-02-15T21:41:05.988Z
The conceptual Doppelgänger problem 2023-02-12T17:23:56.278Z
Control 2023-02-05T16:16:41.015Z
Structure, creativity, and novelty 2023-01-29T14:30:19.459Z
Gemini modeling 2023-01-22T14:28:20.671Z
Non-directed conceptual founding 2023-01-15T14:56:36.940Z
Dangers of deference 2023-01-08T14:36:33.454Z
The Thingness of Things 2023-01-01T22:19:08.026Z
[link] The Lion and the Worm 2022-05-16T20:40:22.659Z
Harms and possibilities of schooling 2022-02-22T07:48:09.542Z
Rituals and symbolism 2022-02-10T16:00:14.635Z
Index of some decision theory posts 2017-03-08T22:30:05.000Z
Open problem: thin logical priors 2017-01-11T20:00:08.000Z
Training Garrabrant inductors to predict counterfactuals 2016-10-27T02:41:49.000Z
Desiderata for decision theory 2016-10-27T02:10:48.000Z
Failures of throttling logical information 2016-02-24T22:05:51.000Z
Speculations on information under logical uncertainty 2016-02-24T21:58:57.000Z
Existence of distributions that are expectation-reflective and know it 2015-12-10T07:35:57.000Z

Comments

Comment by TsviBT on "Slow" takeoff is a terrible term for "maybe even faster takeoff, actually" · 2024-09-30T17:56:25.999Z · LW · GW

Depends on context; I guess by raw biomass, it's bad because those phrases would probably indicate that people aren't really thinking and they should taboo those phrases and ask why they wanted to discuss them? But if that's the case and they haven't already done that, maybe there's a more important underlying problem, such as Sinclair's razor.

Comment by TsviBT on The Sun is big, but superintelligences will not spare Earth a little sunlight · 2024-09-30T13:38:59.545Z · LW · GW

The paragraph you quoted

There is no equally simple version of Stockfish that is still supreme at winning at chess, but will easygoingly let you take a pawn or too. You can imagine a version of Stockfish which does that -- a chessplayer which, if it's sure it can win anyways, will start letting you have a pawn or two -- but it's not simpler to build. By default, Stockfish tenaciously fighting for every pawn (unless you are falling into some worse sacrificial trap), is implicit in its generic general search through chess outcomes.

is saying that when you make a [thing that achieves very impressive things / strongly steers the world], it probably [in general sucks up all the convergent instrumental resources] because that's simpler than [sucking up all the convergent instrumental resources except in certain cases unrelated to its terminal goals].

Humanity getting a sliver of the Sun's energy for the next million years, would be a noticeable waste of convergent instrumental resources from the AI's perspective. Humanity getting a sliver of the Sun's energy while the nanobots are infecting our bloodstream, in order that we won't panic, and then later sucking up all the Sun's energy, is just good tactics; letting you sac your bishop for a pawn for no reason is analogous.

You totally can rewrite Stockfish so that it genuinely lets you win material, but is still unbeatable. You just check: is the evalulation >+20 for Stockfish right now, and will it stay >+15 if I sac this pawn for no benefit? If so, sac the pawn for no benefit. This would work. The point is it's more complicated, and you have to know something about how Stockfish works, and it's only stable because Stockfish doesn't have robust self-improvement optimization channels.

Comment by TsviBT on "Slow" takeoff is a terrible term for "maybe even faster takeoff, actually" · 2024-09-30T07:55:13.761Z · LW · GW

Not sure I understand your question. If you mean just what I think is the case about FOOM:

  • Obviously, there's no strong reason humans will stay coupled with an AGI. The AGI's thoughts will be highly alien--that's kinda the point.
  • Obviously, new ways of thinking recursively beget powerful new ways of thinking. This is obvious from the history of thinking and from introspection. And obviously this goes faster and faster. And obviously will go much faster in an AGI.
  • Therefore, from our perspective, there will be a fast-and-sharp FOOM.
  • But I don't really know what to think about Christiano-slow takeoff.
    • I.e. a 4-year GDP doubling before a 1-year GDP doubling.
    • I think Christiano agrees that there will later be a sharp/fast/discontinuous(??) FOOM, but he thinks things will get really weird and fast before that point. To me this is vaguely in the genre of trying to predict whether you can usefully get nuclear power out of a pile without setting off a massive explosion, when you've only heard conceptually about the idea of nuclear decay. But I imagine Christiano actually did some BOTECs to get the numbers "4" and "1".
    • If I were to guess at where I'd disagree with Christiano: Maybe he thinks that in the slow part of the slow takeoff, humans can make a bunch of progress on aligning / interfacing with / getting work out of AI stuff, to such an extent that from those future humans's perspectives, the fast part of the slow takeoff will actually be slow, in the relative sense. In other words, if the fast part came today, it would be fast, but if it came later, it would be slow, because we'd be able to keep up. Whereas I think aligning/interfacing, in the part where it counts, is crazy hard, and doesn't especially have to be coupled with nascent-AGI-driven capabilities advances. A lot of Christiano's work has (explicitly) a strategy-stealing flavor: if capability X exists, then we / an aligned thingy should be able to steal the way to do X and do it alignedly. If you think you can do that, then it makes sense to think that our understanding will be coupled with AGI's understanding.
Comment by TsviBT on You can, in fact, bamboozle an unaligned AI into sparing your life · 2024-09-30T07:35:27.429Z · LW · GW

This nonlinearity also seems strange to have, without also accepting quantum-immortality-type arguments. In particular, you only need to bargain for UFAIs to kill all humans painlessly and instantaneously; and then you just simulate those same humans yourself. (And if you want to save on compute, you can flip quantum coins for a bit.) Maybe it makes sense to have this nonlinearity but not accept this--I'd be curious to see what that position looks like.

Comment by TsviBT on You can, in fact, bamboozle an unaligned AI into sparing your life · 2024-09-30T06:54:32.688Z · LW · GW

They got way more of the Everett branches, so to speak. Suppose that the Pseudosuchians had a 20% chance of producing croc-FAI. So starting at the Triassic, you have that 20% of worlds become croc-god worlds, and 80% become a mix of X-god worlds for very many different Xs; maybe only 5% of worlds produce humans, and only .01% produce Humane-gods.

Maybe doing this with Pseudosuchians is less plausible than with humans because you can more easily model what Humane-gods would bargain for, because you have access to humans. But that's eyebrow-raising. What about Corvid-gods, etc. If you can do more work and get access to vastly more powerful acausal trade partners, seems worth it; and, on the face of it, the leap from [acausal trade is infeasible, period] to [actually acausal trade with hypothetical Humane-gods is feasible] seems bigger than the jump from [trade with Humane-gods is feasible] to [trade with Corvid-gods is feasible] or [trade with Cetacean-gods is feasible], though IDK of course. (Then there's the jump to [trade with arbitrary gods from the multiverse]. IDK.)

Comment by TsviBT on You can, in fact, bamboozle an unaligned AI into sparing your life · 2024-09-30T06:00:57.498Z · LW · GW

That's not when you consider it, you consider it at the first point when you could make agreements with your simulators. But some people think that you can already do this; if you think you can already do this, then you should right now stop being mean to corvids because the Corvid-god would want to give you a substantial amount of what you like in exchange for you stopping ASAP being mean to corvids.

Comment by TsviBT on You can, in fact, bamboozle an unaligned AI into sparing your life · 2024-09-29T22:33:19.280Z · LW · GW

Nevermind, I was confused, my bad. Yeah you can save a lot more than 10% of the Earths.

As a separate point, I do worry that some other nonhumane coalition has vastly more bargaining power compared to the humane one, by virtue of happening 10 million years ago or whatever. In this case, AIs would tend to realize this fact, and then commit-before-simulation-aware to "figure out what the dominant coalition wants to trade about".

Comment by TsviBT on You can, in fact, bamboozle an unaligned AI into sparing your life · 2024-09-29T22:10:19.317Z · LW · GW

Smarter animals (or rather, smarter animals from, say, 50 million years ago) have a higher fraction of the lightcone under the ownership of their descendants who invented friendly AGI, right? They might want to bargain with human-owned FAI universes.

Comment by TsviBT on You can, in fact, bamboozle an unaligned AI into sparing your life · 2024-09-29T22:08:11.533Z · LW · GW

I'm not figuring it out enough to fully clarify, but: I feel there's some sort of analysis missing here, which would clarify some of the main questions. Something around: What sorts of things can you actually bargain/negotiate/trade for, when the only thing that matters is differences of value? (As opposed to differences of capability.)

  • On the one hand, you have some severe "nonlinearities" (<-metaphor, I think? really I mean "changes in behavior-space that don't trade off very strongly between different values").
    • E.g. we might ask the AI: hey, you are running simulations of the humans you took Earth from. You're torturing them horribly for thousands of years. But look, you can tweak your sims, and you get almost as much of the info you wanted, but now there's no suffering. Please do this (at very low cost to you, great benefit to us) and we'll give you a planet (low cost to us, low benefit to you).
  • On the other hand, you have direct tradeoffs.
    • E.g., everybody needs a Thneed. You have a Thneed. You could give it to me, but that would cost you 1 Thneed and gain me 1 Thneed. This is of negative value (transaction costs). E.g. energy, matter, etc.
    • "Just leave them the solar system" is asking for a trade of Thneeds. Everybody wants to eat Earth.
    • If humane civilization gets 10% of (some subset, starting from some earlier checkpoint, of...?) the lightcone, then they can bargain for at most 10% of other Earths to survive, right? And probably a lot less.
    • This seems to lead to the repugnant conclusion, where humanity is 80% dead or worse; 10% meager existence on a caged Earth; and 10% custodians of a vast array of AIs presiding over solar systems.
Comment by TsviBT on You can, in fact, bamboozle an unaligned AI into sparing your life · 2024-09-29T21:39:45.008Z · LW · GW

So far, my tentative conclusion is that believing that we are probably in a simulation shouldn't really affect our actions.

Well, you should avoid doing things that are severely offensive to Corvid-god and Cetacean-god and Neanderthal-god and Elephant-god, etc., at least to an extent comparable to how you think an AI should orient itself toward monkeys if it thinks it's in your simulation.

Comment by TsviBT on Cryonics is free · 2024-09-29T21:20:17.621Z · LW · GW

Right, but you might prefer

  • living now >
  • not living, no chance of revival or torture >
  • not living, chance of revival later and chance of torture
Comment by TsviBT on "Slow" takeoff is a terrible term for "maybe even faster takeoff, actually" · 2024-09-29T08:54:58.650Z · LW · GW

A thing that didn't appear on your list, and which I think is pretty important (cruxy for a lot of discussions; closest to what Hanson meant in the FOOM debate), is "human-relative discontinuity/speed". Here the question is something like: "how much faster does AI get smarter, compared to humans?". There's conceptual confusion / talking past each other in part because one aspect of the debate is:

  • how much locking force there is between AI and humans (e.g. humans can learn from AIs teaching them, can learn from AI's internals, can use AIs, and humans share ideas with other humans about AI (this was what Hanson argued))

and other aspect is

  • how fast does an intelligence explosion go, by the stars (sidereal).

If you think there's not much coupling, then sidereal speed is the crux about whether takeoff will look discontinuous. But if you think there's a lot of coupling, then you might think something else is a crux about continuity, e.g. "how big are the biggest atomic jumps in capability".

Comment by TsviBT on peterbarnett's Shortform · 2024-09-27T10:02:47.485Z · LW · GW

This is sort of what "calibration" means. Maybe you could say the event is "in-calibration".

Comment by TsviBT on Stanislav Petrov Quarterly Performance Review · 2024-09-26T22:38:50.221Z · LW · GW

dignify


sorry did you mean ignify

Comment by TsviBT on [Completed] The 2024 Petrov Day Scenario · 2024-09-26T16:20:29.247Z · LW · GW
Comment by TsviBT on A Path out of Insufficient Views · 2024-09-25T19:59:49.693Z · LW · GW

Being ignorant, I can't respond in detail. It makes sense that there'd be variation between ideologies, and that many people would have versions that are less, or differently, bad (according to me, on this dimension). But I would also guess that I'd find deep disagreements in more strands, if I knew more about them, that are related to motive dismantling.

For example, I'd expect many strands to incorporate something like the negation

  • of "Reality bites back." or
  • of "Reality is (or rather, includes quite a lot of) that which, when you stop believing in it, doesn't go away." or
  • of " We live in the world beyond the reach of God.".

As another example, I would expect most Buddhists to say that you move toward unity with God (however you want to phrase that) by in some manner becoming less {involved with / reliant on / constituted by / enthralled by / ...} symbolic experience/reasoning, but I would fairly strongly negate this, and say that you can only constitute God via much more symbolic experience/reasoning.

Comment by TsviBT on A Path out of Insufficient Views · 2024-09-25T18:39:22.289Z · LW · GW

It says to avoid suffering by dismantling your motives. Some people act on that advice and then don't try to do things and therefore don't do things. Also so far no one has pointed out to me someone who's done something I'd recognize as good and impressive, and who credibly attributes some of that outcome to Buddhism. (Which is a high bar; what other cherished systems wouldn't reach that bar? But people make wild claims about Buddhism.)

Comment by TsviBT on A Path out of Insufficient Views · 2024-09-24T21:02:18.164Z · LW · GW

(Buddhism seems generally mostly unhelpful and often antihelpful, but) What you say here is very much not giving the problem its due. Our problems are not cartesian--we care about ourselves and each other, and are practically involved with ourselves and each other; and ourselves and each other are diagonalizey, self-createy things. So yes, a huge range of questions can be answered, but there will always be questions that you can't answer. I would guess furthermore that in relevant sense, there will always be deep / central / important / salient / meaningful questions that aren't fully satisfactorily answered; but that's less clear.

Comment by TsviBT on Struggling like a Shadowmoth · 2024-09-24T17:20:02.314Z · LW · GW

There's also the question of the non-helper as an instance of a class, and you as an instance of a class, and the resulting implied ecology. Or to say it a different way: apply TDT to the shadowmoth / meal question. To say it a third way: if people like me react to situations like this--involving some relationship with someone or something--in such-and-such a way, then what trophic niche are we opening up, i.e. what sort of food are we making available for what sort of predator?

Comment by TsviBT on Struggling like a Shadowmoth · 2024-09-24T17:15:30.238Z · LW · GW

A concrete example might be the story in your post. I'm not familiar with the book but from reading a summary, it sounds like Jacen was being eaten! And was convinced to collaborate with his own consumption by the very same shadowmoth story!

Comment by TsviBT on Struggling like a Shadowmoth · 2024-09-24T08:05:32.158Z · LW · GW

Sometimes yes, but also this is a great and common excuse to be eaten.

Comment by TsviBT on Why I funded PIBBSS · 2024-09-19T17:53:30.116Z · LW · GW

I didn't take you to be doing so--it's a reminder for the future.

Comment by TsviBT on Why I funded PIBBSS · 2024-09-19T12:15:35.695Z · LW · GW

"You are [slipping sideways out of reality], and this is bad! Stop it!"

who is 'slipping sideways out of reality' to caveat their communications with an explicit disclaimer that admits that they are doing so

Excuse me, none of that is in my comment.

Comment by TsviBT on Why I funded PIBBSS · 2024-09-19T10:40:25.699Z · LW · GW

IDK how to understand your comment as referring to mine. To clarify the "slipping sideways" thing, I'm alluding to "stepping sideways" described in Q2 here: https://www.lesswrong.com/posts/j9Q8bRmwCgXRYAgcJ/miri-announces-new-death-with-dignity-strategy#Q2___I_have_a_clever_scheme_for_saving_the_world___I_should_act_as_if_I_believe_it_will_work_and_save_everyone__right__even_if_there_s_arguments_that_it_s_almost_certainly_misguided_and_doomed___Because_if_those_arguments_are_correct_and_my_scheme_can_t_work__we_re_all_dead_anyways__right_

and from

https://www.lesswrong.com/posts/m6dLwGbAGtAYMHsda/epistemic-slipperiness-1#Subtly_Bad_Jokes_and_Slipping_Sideways

Comment by TsviBT on Why I funded PIBBSS · 2024-09-19T01:59:29.227Z · LW · GW

Reminder that you have a moral obligation, every single time you're communicating an overall justification of alignment work premised on slow takeoff, in a context where you can spare two sentences without unreasonable cost, to say out loud something to the effect of "Oh and by the way, just so you know, the causal reason I'm talking about this work is that it seems tractable, and the causal reason is not that this work matters.". If you don't, you're spraying your [slipping sideways out of reality] on everyone else.

Comment by TsviBT on Book review: Xenosystems · 2024-09-17T22:58:18.474Z · LW · GW

Right, your "obliqueness thesis" seems like a reasonable summary slogan. I'm lamenting that there are juicy problems here, but it's hard to discuss them theoretically because theoretical discussions are attracted to the two poles.

E.g. when discussing ontic crises, some people's first instinct is to get started on translating/reducing the new worldspace into the old worldspace--this is the pole that takes intelligence as purely instrumental. Or on the other pole, you have the nihilism -> Landian pipeline--confronted with ontic crises, you give up and say "well, whatever works". Both ways shrug off the problem/opportunity of designing/choosing/learning what to be. (I would hope that Heidegger would discuss this explicitly somewhere, but I'm not aware of it.)

In terms of government, you have communists/fascists on the one hand, and minarchists on the other. The founders of the US were neither and thought a lot about what to be. You don't just pretend that you aren't, shouldn't be, don't want to be part of a collective; but that collective should be deeply good; and to be deeply good it has to think; so it can't be totalitarian.

Comment by TsviBT on Book review: Xenosystems · 2024-09-17T22:23:01.993Z · LW · GW

It's pretty annoying that the only positions with common currency are

  1. we have to preserve our values the way they are, and
  2. actually that's confused, so we should just do whatever increases intelligence / effectiveness.

To have goals you have to point to reality, and to point to reality you have to unfold values through novelty. True, true. And you have to make free choices at each ontic crisis, included free choices about what to be. Also true.

Comment by TsviBT on Why I funded PIBBSS · 2024-09-17T07:43:40.075Z · LW · GW

(I have a lot of disagreements with everyone lol, but I appreciate Ryan putting some money where his mouth is re/ blue sky alignment research as a broad category, and the acknowledgement of "rather than the ideal 12-24 months" re/ "connectors".)

Comment by TsviBT on Wei Dai's Shortform · 2024-08-28T15:25:55.608Z · LW · GW

From scratch but not from scratch. https://www.lesswrong.com/posts/noxHoo3XKkzPG6s7E/most-smart-and-skilled-people-are-outside-of-the-ea?commentId=DNvmP9BAR3eNPWGBa

https://tsvibt.blogspot.com/2023/09/a-hermeneutic-net-for-agency.html

Comment by TsviBT on One person's worth of mental energy for AI doom aversion jobs. What should I do? · 2024-08-27T19:18:05.024Z · LW · GW

Sure, though if you're just going to say "I know how to do it! Also I won't tell you!" then it doesn't seem very pointful?

Comment by TsviBT on Wei Dai's Shortform · 2024-08-26T22:24:32.480Z · LW · GW

@Nate Showell @P. @Tetraspace @Joseph Miller @Lorxus 

I genuinely don't know what you want elaboration of. Reacts are nice for what they are, but saying something out loud about what you want to hear more about / what's confusing  / what you did and didn't understand/agree with, is more helpful.

Re/ "to whom not...", I'm asking Wei: what groups of people would not be described by the list of 6 "underestimating the difficult of philosophy" things? It seems to me that broadly, EAs and "AI alignment" people tend to favor somewhat too concrete touchpoints like "well, suppressing revolts in the past has gone like such and such, so we should try to do similar for AGI". And broadly they don't credit an abstract argument about why something won't work, or would only work given substantial further philosophical insight. 

Re/ "don't think thinking ...", well, if I say "LLMs basically don't think", they're like "sure it does, I can keep prompting it and it says more things, and I can even put that in a scaffold" or "what concrete behavior can you point to that it can't do".  Like, bro, I'm saying it can't think. That's the tweet. What thinking is, isn't clear, but That thinking is should be presumed, pending a forceful philosophical conceptual replacement!

Comment by TsviBT on One person's worth of mental energy for AI doom aversion jobs. What should I do? · 2024-08-26T16:30:15.958Z · LW · GW

https://tsvibt.blogspot.com/2023/07/views-on-when-agi-comes-and-on-strategy.html#things-that-might-actually-work

Comment by TsviBT on Wei Dai's Shortform · 2024-08-26T14:32:01.425Z · LW · GW

If you say to someone

Ok, so, there's this thing about AGI killing everyone. And there's this idea of avoiding that by making AGI that's useful like an AGI but doesn't kill everyone and does stuff we like. And you say you're working on that, or want to work on that. And what you're doing day to day is {some math thing, some programming thing, something about decision theory, ...}. What is the connection between these things?

and then you listen to what they say, and reask the question and interrogate their answers, IME what it very often grounds out into is something like:

Well, I don't know what to do to make aligned AI. But it seems like X ϵ {ontology, decision, preference function, NN latent space, logical uncertainty, reasoning under uncertainty, training procedures, negotiation, coordination, interoperability, planning, ...} is somehow relevant.

And, I have a formalized version of some small aspect of X in which is mathematically interesting / philosophically intriguing / amenable to testing with a program, and which seems like it's kinda related to X writ large. So what I'm going to do, is I'm going to tinker with this formalized version for a week/month/year, and then I'm going to zoom out and think about how this relates to X, and what I have and haven't learned, and so on.

This is a good strategy because this is how all mathematical / scientific / technological progress is made: you start with stuff you know; you expand outwards by following veins of interest, tractability, and generality/power; you keep an eye roughly towards broader goals by selecting the broad region you're in; and you build outward. What we see historically is that this process tends to lead us to think about the central / key / important / difficult / general problems--such problems show up everywhere, so we convergently will come to address them in due time. By mostly sticking, in our day-to-day work, to things that are relatively more concrete and tractable--though continually pushing and building toward difficult things--we make forward progress, sharpen our skills, and become familiar with the landscape of concepts and questions.

So I would summarize that position as endorsing streetlighting, in a very broad sense that encompasses most math / science / technology. And this position is largely correct! My claim is that

  1. this is probably too slow for making Friendly AI, and
  2. maybe one could go faster by trying to more directly cleave to the core philosophical problems.

I discuss the problem more here: https://tsvibt.blogspot.com/2023/09/a-hermeneutic-net-for-agency.html

(But note that, while that essay frames things as "a proposed solution", the solution is barely anything--more like a few guesses at pieces of methodology--and the main point is the discussion of the problem; maybe a writing mistake.)

An underemphasized point that I should maybe elaborate more on: a main claim is that there's untapped guidance to be gotten from our partial understanding--at the philosophical level and for the philosophical level. In other words, our preliminary concepts and intuitions and propositions are, I think, already enough that there's a lot of progress to be made by having them talk to each other, so to speak.

Comment by TsviBT on Wei Dai's Shortform · 2024-08-26T01:07:58.894Z · LW · GW

The type of fundamental problem that proper speculative philosophy is supposed to solve is the sort where streetlighting doesn't work (or isn't working, or isn't working fast enough). But nearly all of the alignment field after like 2004 was still basically streetlighting. It was maybe a reasonable thing to have some hope in prospectively, but retrospectively it was too much investment in streetlighting, and retrospectively I can make arguments about why one should have maybe guessed that at the time. By 2018 IIRC, or certainly by 2019, I was vociferously arguing for that in AF team meetings--but the rest of the team either disagreed with me or didn't understand me, and on my own I'm just not that good a thinker, and I didn't find anyone else to try it with. I think they have good thoughts, but are nevertheless mostly streetlighting--i.e. not trying to take step after step of thinking at the level of speculative philosophy AND aimed at getting the understanding needed for alignment.

Comment by TsviBT on Wei Dai's Shortform · 2024-08-25T22:27:20.486Z · LW · GW

Yeah that was not my reaction. (More like "that's going to be the most beautiful thing ever" and "I want to be that too".)

more cautious/modest/self-critical about proposing new philosophical solutions

No, if anything the job loss resulted from not doing so much more, much more intently, and much sooner.

Comment by TsviBT on Wei Dai's Shortform · 2024-08-25T19:14:44.720Z · LW · GW

To whom does this not apply? Most people who "work on AI alignment" don't even think that thinking is a thing.

Comment by TsviBT on RobertM's Shortform · 2024-08-24T23:09:43.755Z · LW · GW

True (but obvious) taken literally. But if you also mean it's good to show sympathy by changing your stance in the discourse, such as by reallocating private or shared attention, it's not always true. In particular, many responses you implement could be exploited.

For example, say I'm ongoingly doing something bad, and whenever you try to talk to me about it, I "get upset". In this case, I'm probably actually upset, probably for multiple reasons; and probably a deep full empathic understanding of the various things going on with me would reveal that, in some real ways, I have good reason to be upset / there's something actually going wrong for me. But now say that your response to me "getting upset" is to allocate our shared attention away from the bad thing I'm doing. That may indeed be a suitable thing to do; e.g., maybe we can work together to understand what I'm upset about, and get the good versions of everything involved. However, hopefully it's clear how this could be taken advantage of--sometimes even catastrophically, if, say, you are for some reason very committed to the sort of cooperativeness that keeps reallocating attention this way, even to the ongoing abjection of your original concern for the thing I was originally and am ongoingly doing bad. (This is a nonfictional though intentionally vague example.)

Comment by TsviBT on Zach Stein-Perlman's Shortform · 2024-08-24T20:37:16.080Z · LW · GW

(I won't reply more, by default.)

various facts about Anthropic mean that them-making-powerful-AI is likely better than the counterfactual, and evaluating a lab in a vacuum or disregarding inaction risk is a mistake

Look, if Anthropic was honestly and publically saying

We do not have a credible plan for how to make AGI, and we have no credible reason to think we can come up with a plan later. Neither does anyone else. But--on the off chance there's something that could be done with a nascent AGI that makes a nonomnicide outcome marginally more likely, if the nascent AGI is created and observed by people are at least thinking about the problem--on that off chance, we're going to keep up with the other leading labs. But again, given that no one has a credible plan or a credible credible-plan plan, better would be if everyone including us stopped. Please stop this industry.

If they were saying and doing that, then I would still raise my eyebrows a lot and wouldn't really trust it. But at least it would be plausibly consistent with doing good.

But that doesn't sound like either what they're saying or doing. IIUC they lobbied to remove protection for AI capabilities whistleblowers from SB 1047! That happened! Wow! And it seems like Zac feels he has to pretend to have a credible credible-plan plan.

Comment by TsviBT on Zach Stein-Perlman's Shortform · 2024-08-24T19:33:42.778Z · LW · GW

Hm. I imagine you don't want to drill down on this, but just to state for the record, this exchange seems like something weird is happening in the discourse. Like, people are having different senses of "the point" and "the vibe" and such, and so the discourse has already broken down. (Not that this is some big revelation.) Like, there's the Great Stonewall of the AGI makers. And then Zac is crossing through the gates of the Great Stonewall to come and talk to the AGI please-don't-makers. But then Zac is like (putting words in his mouth) "there's no Great Stonewall, or like, it's not there in order to stonewall you in order to pretend that we have a safe AGI plan or to muddy the waters about whether or not we should have one, it's there because something something trade secrets and exfohazards, and actually you're making it difficult to talk by making me work harder to pretend that we have a safe AGI plan or intentions that should promissorily satisfy the need for one".

Comment by TsviBT on Zach Stein-Perlman's Shortform · 2024-08-24T19:16:20.378Z · LW · GW

@Zach Stein-Perlman , you're missing the point. They don't have a plan. Here's the thread (paraphrased in my words):

Zach: [asks, for Anthropic]
Zac: ... I do talk about Anthropic's safety plan and orientation, but it's hard because of confidentiality and because many responses here are hostile. ...
Adam: Actually I think it's hard because Anthropic doesn't have a real plan. 
Joseph: That's a straw-man. [implying they do have a real plan?]
Tsvi: No it's not a straw-man, they don't have a real plan.
Zach: Something must be done. Anthropic's plan is something. 
Tsvi: They don't have a real plan. 
 

Comment by TsviBT on Viliam's Shortform · 2024-08-24T19:02:03.448Z · LW · GW

Saying whether "something" "is" "stupid" is sort of confused. If I run algorithm X which produces concrete observable Y, and X is good and Y is bad, is Y stupid? When you say that Y is stupid, what are you referring to? Usually we don't even want to refer to [Y, and Y alone, to the exclusion of anything Y is entangled with / dependent on / productive of / etc.].

Comment by TsviBT on Zach Stein-Perlman's Shortform · 2024-08-24T18:57:19.340Z · LW · GW

most people believe (implicitly or explicitly) that empirical research is the only feasible path forward to building a somewhat aligned generally intelligent AI scientist.

I don't credit that they believe that. And, I don't credit that you believe that they believe that. What did they do, to truly test their belief--such that it could have been changed? For most of them the answer is "basically nothing". Such a "belief" is not a belief (though it may be an investment, if that's what you mean). What did you do to truly test that they truly tested their belief? If nothing, then yours isn't a belief either (though it may be an investment). If yours is an investment in a behavioral stance, that investment may or may not be advisable, but it would DEFINITELY be inadvisable to pretend to yourself that yours is a belief.

Comment by TsviBT on Zach Stein-Perlman's Shortform · 2024-08-24T17:40:18.773Z · LW · GW

But that's not a plan to ensure their uranium pile goes well.

Comment by TsviBT on Zach Stein-Perlman's Shortform · 2024-08-24T15:45:03.505Z · LW · GW

How is it a straw-man? How is the plan meaningfully different from that?

Imagine a group of people has already gathered a substantial amount of uranium, is already refining it, is already selling power generated by their pile of uranium, etc. And doing so right near and upwind of a major city. And they're shoveling more and more uranium onto the pile, basically as fast as they can. And when you ask them why they think this is going to turn out well, they're like "well, we trust our leadership, and you know we have various documents, and we're hiring for people to 'Develop and write comprehensive safety cases that demonstrate the effectiveness of our safety measures in mitigating risks from huge piles of uranium', and we have various detectors such as an EM detector which we will privately check and then see how we feel". And then the people in the city are like "Hey wait, why do you think this isn't going to cause a huge disaster? Sure seems like it's going to by any reasonable understanding of what's going on". And the response is "well we've thought very hard about it and yes there are risks but it's fine and we are working on safety cases". But... there's something basic missing, which is like, an explanation of what it could even look like to safely have a huge pile of superhot uranium. (Also in this fantasy world no one has ever done so and can't explain how it would work.)

Comment by TsviBT on Why you should be using a retinoid · 2024-08-20T03:16:11.980Z · LW · GW

(IDK anything about the underlying contingent facts, but:

  1. there's a large relative difference between .967 and .98; almost half as much distance to 1. If exposure is really bad, this difference could matter.
  2. If there's a damage repair mechanism with something like a rate of repair, that mechanism can either be overwhelmed or not overwhelmed by incoming damage--it's an almost discrete threshold. )
Comment by TsviBT on Extended Interview with Zhukeepa on Religion · 2024-08-19T03:48:31.076Z · LW · GW

I'll try a bit but it would take like 5000 words to fully elaborate, so I'd need more info on which part is unclear or not trueseeming.

One piece is thinking of individual humans vs collectives. If an individual can want in the fullest sense, then a collective is some sort of combination of wants from constituents--a reconciliation. If an individual can't want in the fullest sense, but a collective can, then: If you take several individuals with their ur-wants and create a collective with proper wants, then a proper want has been created de novo.

The theogenic/theopoetic faculty points at creating collectives-with-wants, but it isn't a want itself. A flowerbud isn't a flower.

The picture is complicated of course. For example, individual humans can do this process on their own somewhat, with themselves. And sometimes you do have a want, and you don't understand the want clearly, and then later come to understand the want more clearly. But part of what I'm saying is that many episodes that you could retrospectively describe that way are not really like that; instead, you had a flowerbud, and then by asking for a flower you called the flowerbud to bloom.

Comment by TsviBT on Extended Interview with Zhukeepa on Religion · 2024-08-18T21:14:52.517Z · LW · GW

I'm saying that a religious way of being is one where the minimal [thing that can want, in the fullest sense] is a collective.

Comment by TsviBT on Extended Interview with Zhukeepa on Religion · 2024-08-18T20:30:20.333Z · LW · GW

An idealized version would be like a magic box that's able to take in a bunch of people with conflicting preferences about how they ought to coordinate (for example, how they should govern their society), figure out a synthesis of their preferences,


(I didn't read most of the dialogue so this may be addressed elsewhere)

I think this is subtly but importantly wrong. I think what you're actually supposed to be trying to get at is more like creating preferences than reconciling preferences. 

Comment by TsviBT on Elizabeth's Shortform · 2024-08-03T23:50:17.050Z · LW · GW

This also sometimes implies that efforts to get out tend to suffer defeat in detail (https://en.wikipedia.org/wiki/Defeat_in_detail). If there's 4 factors amphicausally propagating themselves, and you intervene on 1 (one) factor (strongly and in the right direction), the other 3 factors might be enough to maintain the bad equilibrium anyway.

This can lead to boondoggling: you correctly perceive that intervention X is somehow relevant, and is somehow directionally correct and has some effect. A bit of X gets a small temporary good effect. So you do X more. It doesn't work. But maybe that's just because you didn't do X enough. So you invest even more in X. Since the bad equilibrium is confusing (no one root factor, in terms of factors you already understand) and out of sight, you don't know why more X doesn't work, so you don't have an intuitive reason to not think more X might help... so you just keep doing more X even though it doesn't get you out of the equilibrium.

Comment by TsviBT on What could a policy banning AGI look like? · 2024-08-02T16:53:59.664Z · LW · GW

Have there been serious (e.g. large fines, jail time, corporate dissolution) penalties (e.g. judicial or statutory) for large bodies (companies, contractors, government orgs) due to extreme negligence about some harm prospectively (without the harm having happened yet) and speculatively (where the harm has not actually ever happened)?

As a hypothetical example, suppose that nuclear regulation is informed by Scenario X, in which 100k people die. Scenario X is believed to happen if conditions A,B,C are met, so nuclear power companies are required to meet conditions ¬A,¬B,¬C. But then an inspector finds that ¬A and ¬B are not firmly met. So then the company is dissolved and the CEO is thrown in jail.

What are some extreme examples of this? (E.g. an extreme penalty, or where the negligence is unclear (prospective, speculative).)