quila's Shortform

post by quila · 2023-12-22T22:02:50.644Z · LW · GW · 178 comments

Contents

181 comments

178 comments

Comments sorted by top scores.

comment by quila · 2025-01-04T20:11:17.085Z · LW(p) · GW(p)

(edit 3: i'm not sure, but this text might be net-harmful to discourse)

i continue to feel so confused at what continuity led to some users of this forum asking questions like, "what effect will superintelligence have on the economy?" or otherwise expecting an economic ecosystem of superintelligences (e.g. 1 [LW · GW][1], 2 [LW · GW] (edit 2: I misinterpreted this question [LW(p) · GW(p)])).

it actually reminds me of this short story by davidad, in which one researcher on an alignment team has been offline for 3 months, and comes back to find the others on the team saying things like "[Coherent Extrapolated Volition?] Yeah, exactly! Our latest model is constantly talking about how coherent he is. And how coherent his volitions are!", in that it's something i thought this forum would have seen as 'confused about the basics' just a year ago, and i don't yet understand what led to it.

(edit: i'm feeling conflicted about this shortform after seeing it upvoted this much. the above paragraph would be unsubstantive/bad discourse if read as an argument by analogy, which i'm worried it was (?). i was mainly trying to express confusion.)

from the power of intelligence [LW · GW] (actually, i want to quote the entire post, it's short):

I keep trying to explain to people that the archetype of intelligence is not Dustin Hoffman in Rain Man. It is a human being, period. It is squishy things that explode in a vacuum, leaving footprints on their moon. Within that gray wet lump is the power to search paths through the great web of causality, and find a road to the seemingly impossible—the power sometimes called creativity.

People—venture capitalists in particular—sometimes ask how, if the Machine Intelligence Research Institute successfully builds a true AI, the results will be commercialized. This is what we call a framing problem. [...]

a value-aligned superintelligence directly creates utopia. an "intent-aligned" or otherwise non-agentic truthful superintelligence, if that were to happen, is most usefully used to directly tell you how to create a value-aligned agentic superintelligence. if the thing in question cannot do one of these things it is not superintelligence, but something else.

  1. ^
Replies from: Signer, LRudL, MondSemmel, sharmake-farah, Lblack, None, MondSemmel, dr_s
comment by Signer · 2025-01-05T11:08:19.439Z · LW(p) · GW(p)

People are confused about the basics because the basics are insufficiently justified.

comment by L Rudolf L (LRudL) · 2025-01-06T16:43:40.787Z · LW(p) · GW(p)

As far as I know, my post [LW · GW] started the recent trend you complain about.

Several commenters on this thread (e.g. @Lucius Bushnaq [LW · GW] here [LW(p) · GW(p)] and @MondSemmel [LW · GW] here [LW(p) · GW(p)]) mention LessWrong's growth and the resulting influx of uninformed new users as the likely cause. Any such new users may benefit from reading my recently-curated review of Planecrash [LW · GW], the bulk of which is about summarising Yudkowsky's worldview.

i continue to feel so confused at what continuity led to some users of this forum asking questions like, "what effect will superintelligence have on the economy?" or otherwise expecting an economic ecosystem of superintelligences

If there's decision-making about scarce resources, you will have an economy. Even superintelligence does not necessarily imply infinite abundance of everything, starting with the reason that our universe only has so many atoms. Multipolar outcomes seem plausible under continuous takeoff, which the consensus view in AI safety (as I understand it) sees as more likely than fast takeoff. I admit that there are strong reasons [AF · GW] for thinking that the aggregate of a bunch of sufficiently smart things is agentic, but this isn't directly relevant for the concerns about humans within the system in my post.

a value-aligned superintelligence directly creates utopia

In his review of Peter Singer's commentary on Marx, Scott Alexander writes:

[...] Marx was philosophically opposed, as a matter of principle, to any planning about the structure of communist governments or economies. He would come out and say it was irresponsible to talk about how communist governments and economies will work. He believed it was a scientific law, analogous to the laws of physics, that once capitalism was removed, a perfect communist government would form of its own accord. There might be some very light planning, a couple of discussions, but these would just be epiphenomena of the governing historical laws working themselves out.

Peter Thiel might call this "indefinite optimism" [LW · GW]: delay all planning or visualisation because there's some later point where it's trusted things will all sort themselves out. Now, if you think that takeoff will definitely be extremely hard and the resulting superintelligence will effortlessly take over the world, then obviously it makes sense to focus on what that superintelligence will want to do. But what if takeoff lasts months or years or decades? (Note that there can be lots of change even within months if the stakes look extreme to powerful actors!) Aren't you curious about what an aligned superintelligence will end up deciding about society and humans? Are you so sure about the transition period being so short and the superintelligence being so unitary and multipolar outcomes being so unlikely that we'll never have to worry about problems downstream of the incentive issues and competitive pressures that I discuss (which Beren recently had an excellent post [LW · GW] on)? Are you so sure that there is not a single interesting, a priori deducible fact about the superintelligent economy beyond "a singleton is in charge and everything is utopia"?

Replies from: MondSemmel, Lblack
comment by MondSemmel · 2025-01-06T17:12:38.691Z · LW(p) · GW(p)

The default outcome is an unaligned superintelligence singleton destroying the world and not caring about human concepts like property rights. Whereas an aligned superintelligence can create a far more utopian future than a human could come up with, and cares about capitalism and property rights only to the extent that that's what it was designed to care about.

So I indeed don't get your perspective. Why are humans still appearing as agents or decision-makers in your post-superintelligence scenario at all? If the superintelligence for some unlikely reason wants a human to stick around and to do something, then it doesn't need to pay them. And if a superintelligence wants a resource, it can just take it, no need to pay for anything.

Replies from: sharmake-farah, sohaib-imran
comment by Noosphere89 (sharmake-farah) · 2025-01-06T18:26:27.881Z · LW(p) · GW(p)

@L Rudolf L [LW · GW] can talk on his own, but for me, a crux probably is I don't expect either unaligned superintelligence singleton or a value aligned superintelligence creating utopia as the space of likely outcomes within the next few decades.

For the unaligned superintelligence point, my basic reasons is I now believe the alignment problem got significantly easier compared to 15 years ago, I've become more bullish on AI control working out since o3, and I've come to think instrumental convergence is probably correct for some AIs we build in practice, but that instrumental drives are more constrainable on the likely paths to AGI and ASI.

For the alignment point, a big reason for this is I now think a lot of what makes an AI aligned is primarily data, compared to inductive biases, and one of my biggest divergences with the LW community comes down to me thinking that inductive bias is way less necessary for alignment than people usually think, especially compared to 15 years ago.

For AI control, one update I've made for o3 is that I believe OpenAI managed to get the RL loop working in domains where outcomes are easily verifiable, but not in domains where verifying is hard, and programming/mathematics are such domains where verifying is easy, but the tie-in is that capabilities will be more spikey/narrow than you may think, and this matters since I believe narrow/tool AI has a relevant role to play in an intelligence explosion, so you can actually affect the outcome by building narrow capabilities AI for a few years, and the fact that AI capabilities are spikey in domains where we can easily verify outcomes is good for eliciting AI capabilities, which is a part of AI control.

For the singleton point, it's probably because I believe takeoff is both slow and somewhat distributed enough such that multiple superintelligent AIs can arise.

For the value-aligned superintelligence creating a utopia for everyone, my basic reason for why I don't really believe in this is because I believe value conflicts are effectively irresolvable due to moral subjectivism, which forces the utopia to be a utopia for some people, and I expect the set of people that are in an individual utopia to be small in practice (because value conflicts become more relevant for AIs that can create nation-states all by themselves.)

For why humans are decision makers, this is probably because AI is either controlled or certain companies have chosen to make AIs follow instruction-following drives, and that actually succeeding.

comment by Sohaib Imran (sohaib-imran) · 2025-01-06T17:41:08.968Z · LW(p) · GW(p)

And why must alignment be binary? (aligned, or misaligned, where misaligned necessarily means it destroys the world and does not care about property rights)

Why can you not have an a superintelligence that is only misaligned when it comes to issues of wealth distribution?

Relatedly, are we sure that CEV is computable?

Replies from: MondSemmel
comment by MondSemmel · 2025-01-06T18:18:03.121Z · LW(p) · GW(p)

I guess we could in theory fail and only achieve partial alignment, but that seems like a weird scenario to imagine. Like shooting for a 1 in big_number target (= an aligned mind design in the space of all potential mind designs) and then only grazing it. How would that happen in practice?

And what does it even mean for a superintelligence to be "only misaligned when it comes to issues of wealth distribution"? Can't you then just ask your pretty-much-perfectly-aligned entity to align itself on that remaining question?

Replies from: sohaib-imran
comment by Sohaib Imran (sohaib-imran) · 2025-01-06T23:27:56.296Z · LW(p) · GW(p)

I guess we could in theory fail and only achieve partial alignment, but that seems like a weird scenario to imagine. Like shooting for a 1 in big_number target (= an aligned mind design in the space of all potential mind designs) and then only grazing it. How would that happen in practice?

Are you saying that the 1 aligned mind design in the space of all potential mind designs is an easier target than the subspace composed of mind designs that does not destroy the world? If so, why? is it a bigger target? is it more stable?

Can't you then just ask your pretty-much-perfectly-aligned entity to align itself on that remaining question?

No, because the you who can ask (the persons in power) is themselves misaligned with the 1 alignment target that perfectly captures all our preferences.

Replies from: MondSemmel
comment by MondSemmel · 2025-01-14T16:24:18.275Z · LW(p) · GW(p)

Are you saying that the 1 aligned mind design in the space of all potential mind designs is an easier target than the subspace composed of mind designs that does not destroy the world?

I didn't mean that there's only one aligned mind design, merely that almost all (99.999999...%) conceivable mind designs are unaligned by default, so the only way to survive is if the first AGI is designed to be aligned, there's no hope that a random AGI just happens to be aligned. And since we're heading for the latter scenario, it would be very surprising to me if we managed to design a partially aligned AGI and lose that way.

No, because the you who can ask (the persons in power) is themselves misaligned with the 1 alignment target that perfectly captures all our preferences.

I expect the people in power are worrying about this way more than they worry about the overwhelming difficulty of building an aligned AGI in the first place. (Case in point: the manufactured AI race with China.) As a result I expect they'll succeed at building a by-default-unaligned AGI and driving themselves and us to extinction. So I'm not worried about instead ending up in a dystopia ruled by some government or AI lab owner.

comment by Lucius Bushnaq (Lblack) · 2025-01-07T07:29:47.593Z · LW(p) · GW(p)

Are you so sure that there is not a single interesting, a priori deducible fact about the superintelligent economy beyond "a singleton is in charge and everything is utopia"?

End points are easier to infer than trajectories, so sure, I think there's some reasonable guesses you can try to make about how the world might look after aligned superintelligence, should we get it somehow. 

For example, I think it's a decent bet that basically all minds would exist solely as uploads almost all of the time, because living directly in physical reality is astronomically wasteful and incredibly inconvenient. Turning on a physical lamp every time you want things to be brighter means wiggling about vast numbers of particles and wasting an ungodly amount of negentropy just for the sake of the teeny tiny number of bits about these vast numbers of particles that actually make it to your eyeballs, and the even smaller number of bits that actually end up influencing your mind state and making any difference to your perception of the world. All of the particles[1] in the lamp in my bedroom, the air its light shines through, and the walls it bounces off, could be so much more useful arranged in an ordered dance of logic gates where every single movement and spin flip is actually doing something of value. If we're not being so incredibly wasteful about it, maybe we can run whole civilisations for aeons on the energy and negentropy that currently make up my bedroom. What we're doing right now is like building an abacus out of supercomputers. I can't imagine any mature civilisation would stick with this.

It's not that I refuse to speculate about how  a world post aligned superintelligence might look. I just didn't think that your guess was very plausible. I don't think pre-existing property rights or state structures would matter very much in such a world, even if we don't get what is effectively a singleton, which I doubt. If a group of superintelligent AGIs is effectively much more powerful and productive than the entire pre-existing economy, your legal share of that pre-existing economy is not a very relevant factor in your ability to steer the future and get what you want. The same goes for pre-existing military or legal power. 

  1. ^

    Well, the conserved quantum numbers of my room, really.

Replies from: faul_sname
comment by faul_sname · 2025-01-07T09:57:17.265Z · LW(p) · GW(p)

End points are easier to infer than trajectories

Assuming that which end point you get to doesn't depend on the intermediate trajectories at least.

Replies from: sharmake-farah
comment by Noosphere89 (sharmake-farah) · 2025-01-07T15:29:42.060Z · LW(p) · GW(p)

Something like a crux here is I believe the trajectories non-trivially matter for which end-points we get, and I don't think it's like entropy where we can easily determine the end-point without considering the intermediate trajectory, because I do genuinely think some path-dependentness is present in history, which is why even if I were way more charitable towards communism I don't think this was ever defensible:

[...] Marx was philosophically opposed, as a matter of principle, to any planning about the structure of communist governments or economies. He would come out and say it was irresponsible to talk about how communist governments and economies will work. He believed it was a scientific law, analogous to the laws of physics, that once capitalism was removed, a perfect communist government would form of its own accord. There might be some very light planning, a couple of discussions, but these would just be epiphenomena of the governing historical laws working themselves out.

comment by MondSemmel · 2025-01-04T23:27:13.088Z · LW(p) · GW(p)

Another issue is the Eternal September issue where LW membership has grown a ton due to the AI boom (see the LW site metrics [? · GW] in the recent fundraiser post), so as one might expect, most new users haven't read the old stuff on the site. There are various ways in which the LW team tries to encourage them to read those, but nevertheless.

comment by Noosphere89 (sharmake-farah) · 2025-01-05T00:21:05.756Z · LW(p) · GW(p)

The basic answer is the following:

  1. The incentive problem still remains, such that it's more effective to use the price system than to use a command economy to deal with incentive issues:

https://x.com/MatthewJBar/status/1871640396583030806

  1. Related to this, perhaps the outer loss of the markets isn't nearly as dispensable as a lot of people on LW believe, and contact with reality is a necessary part of all future AIs.

More here:

https://gwern.net/backstop

  1. A potentially large crux is I don't really think a utopia is possible, at least in the early years even by superintelligences, because I expect preferences in the new environment to grow unboundedly such that preferences are always dissatisfied, even charitably assuming a restriction on the utopia concept to be relative to someone else's values.
Replies from: quila
comment by quila · 2025-01-05T02:56:38.186Z · LW(p) · GW(p)

The incentive problem still remains, such that it's more effective to use the price system than to use a command economy to deal with incentive issues:

going by the linked tweet, does "incentive problem" mean "needing to incentivize individuals to share information about their preferences in some way, which is currently done through their economic behavior, in order for their preferences to be fulfilled"? and contrasted with a "command economy", where everything is planned out long in advance, and possibly on less information about the preferences of individual moral patients?

if so, those sound like abstractions which were relevant to the world so far, but can you not imagine any better way a superintelligence could elicit this information? it does not need to use prices or trade. some examples:

  • it could have many copies of itself talk to them
  • it could let beings enter whatever they want into a computer in real time, or really let beings convey their preferences in whatever medium they prefer, and fulfill them[1]
  • it could mind-scan those who are okay with this.

(these are just examples selected for clarity; i personally would expect something more complex and less thing-oriented, around moral patients who are okay with/desire it, where superintelligence imbues itself as computation throughout the lowest level of physics upon which this is possible, and so it is as if physics itself is contextually aware and benevolent)

(i think these also sufficiently address your point 2, about SI needing 'contact with reality')

there is also a second (but non-cruxy) assumption here, that preference information would need to be dispersed across some production ecosystem, which would not be true given general-purpose superintelligent nanofactories. this though is not a crux as long as whatever is required for production can fit on, e.g., a planet (which the information derived in, e.g., one of those listed ways, can be communicated across at light-speed, as we partially do now).

A potentially large crux is I don't really think a utopia is possible, at least in the early years even by superintelligences, because I expect preferences in the new environment to grow unboundedly such that preferences are always dissatisfied

i interpret this to mean "some entities' values will want to use as much matter as they can for things, so not all values can be unboundedly fulfilled". this is true and not a crux. if a moral patient who wants to make unboundedly much of something actually making unboundedly much of it would be less good than other ways the world could be, then an (altruistically-)aligned agent would choose one of the other ways.

superintelligence is context-aware in this way, it is not {a rigid system which fails to outliers it doesn't expect (e.g.: "tries to create utopia, but instead gives all the lightcone to whichever maximizer requests it all first"), and so which needs a somewhat less rigid but not-superintelligent system (an economy) to avoid this}. i suspect this (superintelligence being context-aware) is effectively the crux here.

  1. ^

    (if morally acceptable, e.g. no creating hells)

Replies from: sharmake-farah
comment by Noosphere89 (sharmake-farah) · 2025-01-06T18:24:42.519Z · LW(p) · GW(p)

i interpret this to mean "some entities' values will want to use as much matter as they can for things, so not all values can be unboundedly fulfilled". this is true and not a crux. if a moral patient who wants to make unboundedly much of something actually making unboundedly much of it would be less good than other ways the world could be, then an (altruistically-)aligned agent would choose one of the other ways.

superintelligence is context-aware in this way, it is not {a rigid system which fails to outliers it doesn't expect (e.g.: "tries to create utopia, but instead gives all the lightcone to whichever maximizer requests it all first"), and so which needs a somewhat less rigid but not-superintelligent system (an economy) to avoid this}. i suspect this (superintelligence being context-aware) is effectively the crux here.


The other issue is value conflicts, which I expect to be mostly irresolvable in a satisfying way by default due to moral subjectivism combined with me believing that lots of value conflicts today are mostly suppressed because people can't make their own nation-states, but with AI, they can, and superintelligence makes the problem worse.

That's why you can't have utopia for everyone.

Replies from: quila
comment by quila · 2025-01-10T14:30:41.127Z · LW(p) · GW(p)

lots of value conflicts today are mostly suppressed because people can't make their own nation-states, but with AI, they can, and superintelligence makes the problem worse.

i think this would not happen for the same fundamental reason that an aligned superintelligence can foresee whatever you can, and prevent / not cause them if it agrees they'd be worse than other possibilities. (more generally, "an aligned superintelligence would cause some bad-to-it thing" is contradictory, usually[1].)

(i wonder if you're using the term 'superintelligence' in a different way though, e.g. to mean "merely super-human"? to be clear i definitionally mean it in the sense of optimal)

(tangentially: the 'nations' framing confuses me)[2]

That's why you can't have utopia for everyone

i think i wrote before that i agree (trivially) that not all possible values can be maximally satisfied; still, you can have the best possible world, which i think on this axis would look like "there being very many possible environments suited to different beings preferences (as long as those preferences are not to cause suffering to others)" instead of "beings with different preferences going to war with each other" (note there is no coordination problem which must be solved for that to happen. a benevolent superintelligence would itself not allow war (and on that, i'll also hedge that if there is some tragedy which would be worth the cost of war to stop, an aligned superintelligence would just stop it directly instead.))

  1. ^

    some exceptions like "it is aligned, but has the wrong decision theory, and gets acausally blackmailed"

  2. ^

    in the world of your premise (with people using superintelligence to then war over value differences), superintelligence, not nations, would be the most powerful thing (with which) to do conflict

Replies from: sharmake-farah
comment by Noosphere89 (sharmake-farah) · 2025-01-10T17:25:30.000Z · LW(p) · GW(p)

i think this would not happen for the same fundamental reason that an aligned superintelligence can foresee whatever you can, and prevent / not cause them if it agrees they'd be worse than other possibilities. (more generally, "an aligned superintelligence would cause some bad-to-it thing" is contradictory, usually[1].)

(i wonder if you're using the term 'superintelligence' in a different way though, e.g. to mean "merely super-human"? to be clear i definitionally mean it in the sense of optimal)

(tangentially: the 'nations' framing confuses me)[2]

 

I think the main point is that what's worse than other possibilities partially depends on your value system at the start, and there is no non-circular way of resolving deep enough values conflicts such that you can always prevent conflict, so with differing enough values, you can generate conflict on it's own.

(Note when I focus on superintelligence, I don't focus on the AI literally doing optimal actions, because that leads to likely being wrong about what AIs can actually do, which is actually important.)

On the nations point, my point here is that people will program their superintelligences with quite different values, and the superintelligences will disagree about what counts as optimal from their lights, and if the disagreements are severe enough (which I predict is plausible if AI development cannot be controlled at all), conflict can definitely happen between the superintelligences, even if humans no longer are the main players.

Also, it's worth it to read these posts and comments, because I perceive some mistakes that are common amongst rationalists:
 

https://www.lesswrong.com/posts/895Qmhyud2PjDhte6/responses-to-apparent-rationalist-confusions-about-game [LW · GW]

https://www.lesswrong.com/posts/HFYivcm6WS4fuqtsc/dath-ilan-vs-sid-meier-s-alpha-centauri-pareto-improvements#jpCmhofRBXAW55jZv [LW(p) · GW(p)]

i think i wrote before that i agree (trivially) that not all possible values can be maximally satisfied; still, you can have the best possible world, which i think on this axis would look like "there being very many possible environments suited to different beings preferences (as long as those preferences are not to cause suffering to others)" instead of "beings with different preferences going to war with each other" (note there is no coordination problem which must be solved for that to happen. a benevolent superintelligence would itself not allow war (and on that, i'll also hedge that if there is some tragedy which would be worth the cost of war to stop, an aligned superintelligence would just stop it directly instead.))

     

I agree you can have a best possible world (though that gets very tricky in infinite realms due to utility theory breaking at that point), but my point here is that the best possible world is relative to a given value set, and also quite unconstrained, and your vision definitely requires other real-life value sets to lose out on a lot, here.

Are you assuming that superintelligences will have common enough values for some reason? To be clear, I think this can happen, assuming AI is controlled by a specific group that has enough of a monopoly on violence to prevent others from making their own AI, but I don't have nearly the confidence that you do that conflict is always avoidable by ASIs by default.

Replies from: quila
comment by quila · 2025-01-10T19:26:35.162Z · LW(p) · GW(p)

you didn't write "yes, i use 'superintelligent' to mean super-human", so i'll write as if you also mean optimal[1]. though i suspect we may have different ideas of where optimal is, which could become an unnoticed crux, so i'm noting it.

people will program their superintelligences

i am expecting the first superintelligent agent to capture the future, and for there to be no time for others to arise to compete with it.

in a hypothetical setup where multiple superintelligences are instantiated at close to the same time within a world, it's plausible to me that they would fight in some way, though also plausible that they'd find a way not to. as an easy reason they might fight: maybe one knows it will win (e.g., it has a slight head start and physics is such that that is pivotal).

in my model of reality: it takes ~2/15ths of a second for light to travel the length of the earth's circumference. maybe there are other bottlenecks that would push the time required for an agentic superintelligence to take over the earth to minutes-to-hours. as long as the first superintelligent (world-valuing-)agent is created at least <that time period's duration> before the next one would have been created, it will prevent that next one's creation. i assign very low likelyhood to multiple superintelligences being independently created within the same hour.

this seems like a crux, and i don't yet know why you expect otherwise, failing meaning something else by superintelligence.

actually, i can see room for disagreement about whether 'slow, gradual buildup of spiky capabilities profiles' would change this. i don't think it would because ... if i try to put it into words, we are in an unstable equilibrium, which will at some point be disrupted, and there are not 'new equilibriums, just with less balance' for the world to fall on. however, gradual takeoff plus a strong defensive advantage inherent in physics could lead to it, for intuitive reasons[2]. in terms of current tech like nukes there's an offensive advantage, but we don't actually know what the limit looks like. although it's hard for me to conceive of a true defensive advantage in fundamental physics that can't be used offensively by macroscopic beings. would be interested in seeing made up examples.


i'll probably read the linked posts anyways, but it looks like you thought i also expected multiple superintelligences to arise at almost the same time, and inferred i was making implicit claims about game theory between them.

  1. ^

    you wrote:

    Nitpick that doesn't matter, but when I focus on superintelligence, I don't focus on the AI literally doing optimal actions, because that leads to likely being wrong about what AIs can actually do

    i mean something with the optimal process (of cognition (learning, problem solving, creativity)), not something that always takes the strictly best action.

    (i'm guessing this is about how the 'optimal action' could sometimes be impractical to compute. for example, the action i could technically take that has the best outcomes might technically be to send off a really alien email that sets off some unknowable-from-my-position butterfly effect.)

  2. ^

    e.g., toy game setup: if you can counter a level 100 attack at level 10, and all the players start within 5 levels of each other and progress at 1 per turn, then it doesn't matter who will reach level 100 first.

Replies from: sharmake-farah, quila
comment by Noosphere89 (sharmake-farah) · 2025-01-10T19:44:44.113Z · LW(p) · GW(p)

i am expecting the first superintelligent agent to capture the future, and for there to be no time for others to arise to compete with it.


I think I understand your position better, and a crux for real-world decision making is that in practice, I don't really think this assumption is correct by default, especially if there's a transition period.

Replies from: quila
comment by quila · 2025-01-10T20:05:06.288Z · LW(p) · GW(p)

i do not understand your position from this, so you're welcome to write more. also, i'm not sure if i added the paragraph about slow takeoff before or after you loaded the comment.

an easy way to convey your position to me might be to describe a practical rollout of the future where all the things in it seem individually plausible to you.

Replies from: sharmake-farah
comment by Noosphere89 (sharmake-farah) · 2025-01-10T21:30:56.833Z · LW(p) · GW(p)

One example of such a future is a case where in 2028, OpenAI managed to scale up enough to make an AI that while not as good as a human worker in general (at least without heavy inference costs), it is good enough to act as a notable accelerant to AI research, such that by 2030-2031, AI research has been more or less automated away by Open AI, with competitors having such systems by 2031-2032, meaning AI progress becomes notably faster such that by 2033, we are on the brink of AI that can do a lot of job work, but the best models at this point are instead reinvested in AI R&D such that by 2035, superhuman AI is broadly achieved, and this is when the economy starts getting seriously disrupted.

The key features here in this future is that intent alignment works well enough that AI generally takes instructions from specific humans, and it's easy for others to get their own superintelligences with different values, such that conflict doesn't go away.

Replies from: quila
comment by quila · 2025-01-10T21:48:20.758Z · LW(p) · GW(p)

The key features here in this future is that the superhuman equals optimal assumption is false [...]

oh, well to clarify then, i was trying to say that i didn't mean 'superhuman' at all, i directly meant optimal. i don't believe that superhuman = optimal, and when reading this story one of the first things that stood out was that the 2035 point is still before the first long-term-decisive entity.

Replies from: sharmake-farah
comment by Noosphere89 (sharmake-farah) · 2025-01-10T22:03:45.785Z · LW(p) · GW(p)

Edited my comment.

Replies from: quila
comment by quila · 2025-01-10T22:14:03.960Z · LW(p) · GW(p)

but it still says "it's easy for others to get their own superintelligences with different values", with 'superintelligence' referring to the 'superhuman' AI of 2035?

my response is the same, the story ends before what i meant by superintelligence has occurred.

(it's okay if this discussion was secretly a definition difference till now!)

Replies from: sharmake-farah
comment by Noosphere89 (sharmake-farah) · 2025-01-10T23:15:46.814Z · LW(p) · GW(p)

Yeah, the crux is I don't think the story ends before superintelligence, for a combination of reasons

Replies from: quila
comment by quila · 2025-01-10T23:23:40.221Z · LW(p) · GW(p)

Yeah, the crux is I don't think the story ends before superintelligence

what i meant by "the story ends before what i meant by superintelligence has occurred" is that the written one ends there in 2035, but at that point there's still time to effect what the first long-term-decisive thing will be.

Replies from: quila
comment by quila · 2025-01-12T01:34:05.344Z · LW(p) · GW(p)

but it still says "it's easy for others to get their own superintelligences with different values", with 'superintelligence' referring to the 'superhuman' AI of 2035?

still confused about this btw. in my second reply to you i wrote:

(i wonder if you're using the term 'superintelligence' in a different way though, e.g. to mean "merely super-human"?)

and you did not say you were, but it looks like you are here?

Replies from: sharmake-farah
comment by Noosphere89 (sharmake-farah) · 2025-01-12T01:49:54.404Z · LW(p) · GW(p)

I was assuming very strongly superhumanly intelligent AI, but yeah no promises of optimality were made here.

That said, I suspect a crux is that optimality ends up with multipolarity, assuming a one world government hasn't happened by then, because I think the offense-defense balance moderately favors defense even at optimality, assuming optimal defense and offense.

Replies from: quila
comment by quila · 2025-01-12T02:03:05.049Z · LW(p) · GW(p)

I was assuming very strongly superhumanly intelligent AI

oh okay, i'll have to reinterpret then. edit: i just tried, but i still don't get it; if it's "very strongly superhuman", why is it merely "when the economy starts getting seriously disrupted"? (<- this feels like it's back at where this thread started)

I think the offense-defense balance moderately favors defense even at optimality

why?

Replies from: sharmake-farah
comment by Noosphere89 (sharmake-farah) · 2025-01-12T02:17:33.613Z · LW(p) · GW(p)

oh okay, i'll have to reinterpret then. edit: i just tried, but i still don't get it; if it's "very strongly superhuman", why is it merely "when the economy starts getting seriously disrupted"? (<- this feels like it's back at where this thread starte

I should probably edit that at some point, but I'm on my phone, so I'll do it tomorrow.

why?

A big reason for this is logistics, as how you are getting to the fight can actually hamper you a lot, and this especially bites hard on offense, because it's easier to get supplies to your area than it is to get supplies to an offensive unit.

This especially matters if physical goods need to be transported from one place to another place.

Replies from: quila
comment by quila · 2025-01-12T09:58:32.231Z · LW(p) · GW(p)

A big reason for this is logistics, as how you are getting to the fight can actually hamper you a lot, and this especially bites hard on offense, because it's easier to get supplies to your area than it is to get supplies to an offensive unit.

ah. for 'at optimality' which you wrote, i don't imagine it to take place on that high of a macroscopic level (the one on which 'supplies' could be transported), i think the limit is more things that look to us like the category of 'angling rays of light just right to cause distant matter to interact in such away as to create an atomic explosion, or some even more destructive reaction we don't yet know about, or to suddenly carve out a copy of itself there to start doing things locally', and also i'm not imagining the competitors being 'solid' macroscopic entities anymore, but rather being patterns imbued (and dispersed) in a relatively 'lower' level of physics (which also do not need 'supplies'). (edit: maybe this picture is wrong, at optimality you can maybe absorb the energy of such explosions / not be damaged by them, if you're not a macroscopic thing. which does actually defeat the main way macroscopic physics has an offense advantage?)

(i'm just exploring what it would be like to be clear, i don't think such conflicts will happen because i still expect just one optimal-level-agent to come from earth)

Replies from: sharmake-farah
comment by Noosphere89 (sharmake-farah) · 2025-01-12T15:55:06.252Z · LW(p) · GW(p)

(i'm just exploring what it would be like to be clear, i don't think such conflicts will happen because i still expect just one optimal-level-agent to come from earth)

I am willing to concede that here, the assumption of non-optimal agents were more necessary than I thought for my argument, and I think you are right on the necessity of the assumption in order to guarantee anything like a normal future (though it still might be multipolar), so I changed a comment.

My new point is that I don't think optimal agents will exist when we lose all control, but yes I didn't realize an assumption was more load-bearing than I thought.

Replies from: quila
comment by quila · 2025-01-12T17:46:15.158Z · LW(p) · GW(p)

My new point is that I don't think optimal agents will exist when we lose all control

(btw I also realized I didn't strictly mean 'optimal' by 'superintelligent', but at least close enough to it / 'strongly superhuman enough' for us to not be able to tell the difference. I originally used the 'optimal' wording trying to find some other definition apart from 'super-human')

it is also plausible to me that life-caring beings first lose control to much narrower programs[1] or moderately superhuman unaligned agents totally outcompeting them economically (if it turns out that making better agents is hard enough that they can't just directly do that instead), or something.

also, a 'multipolar AI-driven but still normal-ish' scenario seems to continue at most until a strong enough agent is created. (e.g. that could be what a race is towards).

(maybe after 'loss of control to weaker AI' scenarios, those weaker AIs also keep making better agents afterwards, but i'm not sure about that, because they could be myopic [? · GW] and in some stable pattern/equilibrium)

  1. ^
comment by quila · 2025-01-10T19:55:14.007Z · LW(p) · GW(p)

i missed this part:

your vision definitely requires other real-life value sets to lose out on a lot, here.

i'm not sure what this means. my values basically refer to other beings having not-tormentful (and next in order of priority, happy/good) existences. (tried to formalize this more but it's hard)

in particular, i'm not sure if you're saying something which would seem trivially true to me or not. (example trivially true thing: someone who wants to tile literally the entire lightcone with happy humans not being able to do that is losing out under 'cosmopolitan' values relative to if their values controlled the entire lightcone. example trivially true thing 2: "the best possible world is relative to a given value set")

Replies from: sharmake-farah
comment by Noosphere89 (sharmake-farah) · 2025-01-10T21:34:56.282Z · LW(p) · GW(p)

i'm not sure what this means. my values basically refer to other beings having not-tormentful (and next in order of priority, happy/good) existences. (tried to formalize this more but it's hard)

That would immediately exclude quite a bit of people, from both the far left and far right, because I predict a lot of people definitely want at least some people to have tormentful lives.

in particular, i'm not sure if you're saying something which would seem trivially true to me or not. (example trivially true thing: someone who wants to tile literally the entire lightcone with happy humans not being able to do that is losing out under 'cosmopolitan' values relative to if their values controlled the entire lightcone. example trivially true thing 2: "the best possible world is relative to a given value set")

I was trying to say something trivially true in your ontology, but far too many people tend to deny that you do in fact have to make other values lose out, and people usually think the best possible world is absolute, not relative, and in particular I think a lot of people use the idea of value-aligned superintelligence as though it was a magic wand that could solve all conflict.

Replies from: quila
comment by quila · 2025-01-12T01:27:09.006Z · LW(p) · GW(p)

far too many people tend to deny that you do in fact have to make other values lose out

i don't know where that might be true, but at least on lesswrong i imagine it's an uncommon belief. a core premise of alignment being important is value orthogonality implying that an unaligned agent with max-level-intelligence would compete for the same resources whose configurations it values (the universe). most of the reason for collaborating on alignment despite orthogonality is that our values tend to overlap to a large degree, e.g. most people (and maybe especially most alignment researchers?) think hells are bad.

also on the "lose out" phrasing: even if someone "wants at least some people to have tormentful lives", they don't "lose out" overall if they also positively value other things / still negatively value any of the vast majority of beings having tormentful lives.

Replies from: sharmake-farah
comment by Noosphere89 (sharmake-farah) · 2025-01-12T02:10:38.686Z · LW(p) · GW(p)

I think a crux I have with the entire alignment community may ultimately come down to me not believing that human values overlap strongly enough to make alignment the most positive thing, compared to other AI safety things.

In particular, I'd expect a surprising amount of disagreement on whether making a hell is good, if you managed to sell it as eternally punishing a favored enemy.

most of the reason for collaborating on alignment despite orthogonality is that our values tend to overlap to a large degree, e.g. most people (and maybe especially most alignment researchers?) think hells are bad.

I agree LWers tend to at least admit that severe enough value conflicts can exist, though I think that people like Eliezer don't realize that human values conflicts sort of break collective CEV type solutions, and a lot of collective alignment solutions tend to assume that either someone puts their thumb on the scale and exclude certain values, or assume that human values are so similar and their idealizations are so similar that no conflicts are expected, which I personally don't think is true.

i don't know where that might be true, but at least on lesswrong i imagine it's an uncommon belief. a core premise of alignment being important is value orthogonality implying that an unaligned agent with max-level-intelligence would compete for the same resources whose configurations it values (the universe).

also on the "lose out" phrasing: even if someone "wants at least some people to have tormentful lives", they don't "lose out" overall if they also positively value other things / still negatively value any of the vast majority of beings having tormentful lives.

Agree with this, which handles some cases, but my worry is that there are still likely to be big values conflicts where one value set must ultimately win out over another.

comment by Lucius Bushnaq (Lblack) · 2025-01-05T14:28:05.046Z · LW(p) · GW(p)

My guess is that it's just an effect of field growth. A lot of people coming in now weren't around when the consensus formed and don't agree with it or don't even know much about it.

Also, the consensus wasn't exactly uncontroversial on LW even way back in the day. Hanson's Ems inhabit a somewhat more recognisable world and economy that doesn't have superintelligence in it, and lots of skeptics used to be skeptical in the sense of thinking all of this AI stuff was way too speculative and wouldn't happen for hundreds of years if ever, so they made critiques of that form or just didn't engage in AI discussions at all. LW wasn't anywhere near this AI-centric when I started reading it around 2010. 

comment by [deleted] · 2025-01-06T15:41:13.358Z · LW(p) · GW(p)
  1. My question specifically asks about the transition to ASI, which, while I think it's really hard to predict, seems likely to take years, during which time we have intelligences just a bit above human level, before they're truly world-changingly superintelligent. I understand this isn't everyone's model, and it's not necessarily mine, but I think it is plausible.
  2. Asking "how could someone ask such a dumb question?" is a great way to ensure they leave the community. (Maybe you think that's a good thing?)
Replies from: quila
comment by quila · 2025-01-06T16:10:32.086Z · LW(p) · GW(p)

Asking "how could someone ask such a dumb question?" is a great way to ensure they leave the community. (Maybe you think that's a good thing?)

I don't, sorry. (I'd encourage you not to leave just because of this, if it was just this. maybe LW mods can reactivate your account? @Habryka [LW · GW])

My question specifically asks about the transition to ASI

Yeah looks like I misinterpreted it. I agree that time period will be important.

I'll try to be more careful.

Fwiw, I wasn't expecting this shortform to get much engagement, but given it did it probably feels like public shaming, if I imagine what it's like.

Replies from: habryka4, quila
comment by habryka (habryka4) · 2025-01-06T19:14:59.232Z · LW(p) · GW(p)

(Happy to reactivate your account, though I think you can also do it yourself)

comment by quila · 2025-01-06T18:47:07.151Z · LW(p) · GW(p)

I hope you're okay btw

Replies from: None
comment by [deleted] · 2025-01-06T20:43:20.679Z · LW(p) · GW(p)

I'm fine. Don't worry to much about this. It just made me think, what am I doing here? For someone to single out my question and say "it's dumb to even ask such a thing" (and the community apparently agrees)... I just think I'll be better off not spending time here.

Replies from: quila
comment by quila · 2025-01-06T21:04:39.080Z · LW(p) · GW(p)

and the community apparently agrees

I'd guess that most just skimmed what was visible from the hoverover, while under the impression it was what my text said. The engagement on your post itself is probably more representative.

For someone to single out my question

Did not mean to do that.

comment by MondSemmel · 2025-01-04T23:21:04.340Z · LW(p) · GW(p)

I guess part of the issue is that in any discussion, people don't use the same terms in the same way. Some people call present-day AI capabilities by terms like "superintelligent" in a specific domain. Which is not how I understand the term, but I understand where the idea to call it that comes from. But of course such mismatched definitions make discussions really hard. Seeing stuff like that makes it very understandable why Yudkowsky wrote the LW Sequences...

Anyway, here [LW(p) · GW(p)] is an example of a recent shortform post which grapples with the same issue that vague terms are confusing.

comment by dr_s · 2025-01-06T09:23:28.188Z · LW(p) · GW(p)

I feel like this is a bit incorrect. There are imaginable things that are smarter than humans at some tasks, smart as average humans at others, thus overall superhuman, yet controllable and therefore possible to integrate in an economy without immediately exploding into an utopian (or dystopian) singularity. The question is whether we are liable to build such things before we build the exploding singularity kind, or if the latter is in some sense easier to build and thus stumble upon first. Most AI optimists think these limited and controllable intelligences are the default natural outcome of our current trajectory and thus expect mere boosts in productivity.

Replies from: quila
comment by quila · 2025-01-06T12:14:45.638Z · LW(p) · GW(p)

There are imaginable things that are smarter than humans at some tasks, smart as average humans at others, thus overall superhuman, yet controllable and therefore possible to integrate in an economy

sure, e.g. i think (<- i may be wrong about what the average human can do) that GPT-4 meets this definition (far superhuman at predicting author characteristics, above-average-human at most other abstract things). that's a totally different meaning.

Most AI optimists think these limited and controllable intelligences are the default natural outcome of our current trajectory and thus expect mere boosts in productivity.

do you mean they believe superintelligence (the singularity-creating kind) is impossible, and so don't also expect it to come after? it's not sufficient for less capable AIs to defaultly come before superintelligence.

Replies from: dr_s
comment by dr_s · 2025-01-07T11:01:06.948Z · LW(p) · GW(p)

I think some believe it's downright impossible and others that we'll just never create it because we have no use for something so smart it overrides our orders and wishes. That at most we'll make a sort of magical genie still bound by us expressing our wishes.

comment by quila · 2024-11-08T12:50:57.036Z · LW(p) · GW(p)

nothing short of death can stop me from trying to do good.

the world could destroy or corrupt EA, but i'd remain an altruist.

it could imprison me, but i'd stay focused on alignment, as long as i could communicate to at least one on the outside.

even if it tried to kill me, i'd continue in the paths through time where i survived.

Replies from: programcrafter, LosPolloFowler, fallcheetah7373
comment by ProgramCrafter (programcrafter) · 2024-11-08T20:17:25.726Z · LW(p) · GW(p)

Never say 'nothing' :-)

  1. the world might be in such state that attempts to do good bring it into some failure instead, and doing the opposite is prevented by society
    (AI rise and blame-credit which rationality movement takes for it, perhaps?)
  2. what if, for some numerical scale, the world would give you option "with 50%, double goodness score; otherwise, lose almost everything"? Maximizing EV on this is very dangerous...
comment by Stephen Fowler (LosPolloFowler) · 2024-11-09T02:27:09.910Z · LW(p) · GW(p)

I upvoted because I imagine more people reading this would slightly nudge group norms in a direction that is positive.

But being cynical:

  • I'm sure you believe that this is true, but I doubt that it is literally true.
  • Signalling this position is very low risk when the community is already on board.
  • Trying to do good may be insufficient if your work on alignment ends up being dual use. 
comment by lesswronguser123 (fallcheetah7373) · 2024-11-10T03:06:22.429Z · LW(p) · GW(p)

Trying to do good.

 

"No!  Try not!  Do, or do not.  There is no try."
       —Yoda

Trying to try [LW · GW]

Replies from: quila
comment by quila · 2024-11-10T03:49:52.448Z · LW(p) · GW(p)

if i left out the word 'trying' to (not) use it in that way instead [LW · GW], nothing about me would change, but there would be more comments saying that success is not certain.

i also disagree with the linked post[1], which says that 'i will do x' means one will set up a plan to achieve the highest probability of x they can manage. i think it instead usually means one believes they will do x with sufficiently high probability to not mention the chance of failure.[2] the post acknowledges the first half of this -- «Well, colloquially, "I'm going to flip the switch" and "I'm going to try to flip the switch" mean more or less the same thing, except that the latter expresses the possibility of failure.» -- but fails to integrate that something being said implies belief in its relevance/importance, and so concludes that using the word 'try' (or, by extrapolation, expressing the possibility of failure in general) is unnecessary in general.

  1. ^

    though its psychological point seems true:

    But if all you want is to "maximize the probability of success using available resources", then that's the easiest thing in the world to convince yourself you've done.

  2. ^

    this is why this wording is not used when the probability of success is sufficiently far (in percentage points, not logits) from guaranteed.

Replies from: fallcheetah7373
comment by lesswronguser123 (fallcheetah7373) · 2024-11-10T04:05:35.044Z · LW(p) · GW(p)

I think the post was a deliberate attempt to overcome that psychology, the issue is you can get stuck in these loops of "trying to try" and convincing yourself that you did enough, this is tricky because it's very easy to rationalise this part for feeling comfort.

When you set up for winning v/s try to set up for winning. 

The latter is much easier to do than the former, and former still implies chance of failure but you actually try to do your best rather than, try to try to do your best. 

I think this sounds convoluted, maybe there is a much easier cognitive algorithm to overcome this tendency.

comment by quila · 2024-11-13T03:31:42.072Z · LW(p) · GW(p)

i might try sleeping for a long time (16-24 hours?) by taking sublingual[1] melatonin right when i start to be awake, and falling asleep soon after. my guess: it might increase my cognitive quality on the next wake up, like this:

(or do useful computation during sleep, leading to apparently having insights on the next wakeup? long elaboration below)

i wonder if it's even possible, or if i'd have trouble falling asleep again despite the melatonin.

i don't see much risk to it, since my day/night cycle is already uncalibrated[2], and melatonin is naturally used for this narrow purpose in the body. 


'cognitive quality' is really vague. here's what i'm really imagining

my unscientific impression of sleep, from subjective experience (though i only experience the result) and speculation i've read, is that it does these things:

  • integrates into memory what happened in the previous wake period, and maybe to a lesser extent further previous ones
  • more separate to the previous wake period, acts on my intuitions or beliefs about things to 'reconcile' or 'compute implicated intuitions'. for example if i was trying to reconcile two ideas, or solve some confusing logical problem, maybe the next day i would find it easier because more background computation has been done about it?
    • maybe the same kind of background cognition that happens during the day, that leads to people having ideas random-feelingly enter their awareness?
    • this is the one i feel like i have some sub-linguistic understanding of how it works in me, and it seems like the more important of the two for abstract problem solving, which memories don't really matter to. for this reason, a higher proportion of sleep or near-sleep in general may be useful for problem solving.

but maybe these are not done almost as much as they could be, because of competing selection pressures for different things, of which sleep-time computations are just some. (being awake is useful to gather food and survive)

anyways, i imagine that after those happening for a longer time, the waking mental state could be very 'fresh' /  aka more unburdened by previous thoughts/experiences (bulletpoint 1), and prone to creativity [LW(p) · GW(p)]/'apparently' having new insights (bulletpoint 2). (there is something it feels like to be in such a state for me, and it happens more just after waking)

  1. ^

    takes effect sooner

  2. ^

     i have the non-24 hour sleep/wake cycle that harry has in HPMOR. for anyone who also does, some resources: 

    from authors note chapter 98:

    Last but not least:

    You know Harry’s non-24 sleep disorder?  I have that.  Normally my days are around 24 hours and 30 minutes long.

    Around a year ago, some friends of mine cofounded MetaMed, intended to provide high-grade analysis of the medical literature for people with solution-resistant medical problems.  (I.e. their people know Bayesian statistics and don’t automatically believe every paper that claims to be ‘statistically significant’ – in a world where only 20-30% of studies replicate, they not only search the literature, but try to figure out what’s actually true.)  MetaMed offered to demonstrate by tackling the problem of my ever-advancing sleep cycle.

    Here’s some of the things I’ve previously tried:

    • Taking low-dose melatonin 1-2 hours before bedtime
    • Using timed-release melatonin
    • Installing red lights (blue light tells your brain not to start making melatonin)
    • Using blue-blocking sunglasses after sunset
    • Wearing earplugs
    • Using a sleep mask
    • Watching the sunrise
    • Watching the sunset
    • Blocking out all light from the windows in my bedroom using aluminum foil, then lining the door-edges with foam to prevent light from slipping in the cracks, so I wouldn’t have to use a sleep mask
    • Spending a total of ~$2200 on three different mattresses (I cannot afford the high-end stuff, so I tried several mid-end ones)
    • Trying 4 different pillows, including memory foam, and finally settling on a folded picnic blanket stuffed into a pillowcase (everything else was too thick)
    • Putting 2 humidifiers in my room, a warm humidifier and a cold humidifier, in case dryness was causing my nose to stuff up and thereby diminish sleep quality
    • Buying an auto-adjusting CPAP machine for $650 off Craigslist in case I had sleep apnea.  ($650 is half the price of the sleep study required to determine if you need a CPAP machine.)
    • Taking modafinil and R-modafinil.
    • Buying a gradual-light-intensity-increasing, sun alarm clock for ~$150

    Not all of this was futile – I kept the darkened room, the humidifiers, the red lights, the earplugs, and one of the mattresses; and continued taking the low-dose  and time-release melatonin.  But that didn’t prevent my sleep cycle from advancing 3 hours per week (until my bedtime was after sunrise, whereupon I would lose several days to staying awake until sunset, after which my sleep cycle began slowly advancing again).

    MetaMed produced a long summary of extant research on non-24 sleep disorder, which I skimmed, and concluded by saying that – based on how the nadir of body temperature varies for people with non-24 sleep disorder and what this implied about my circadian rhythm – their best suggestion, although it had little or no clinical backing, was that I should take my low-dose melatonin 5-7 hours before bedtime, instead of 1-2 hours, a recommendation which I’d never heard anywhere before.

    And it worked.

    I can’t *#&$ing believe that #*$%ing worked.

    (EDIT in response to reader questions:  “Low-dose” melatonin is 200microgram (mcg) = 0.2 mg.  Currently I’m taking 0.2mg 5.5hr in advance, and taking 1mg timed-release just before closing my eyes to sleep.  However, I worked up to that over time – I started out just taking 0.3mg total, and I would recommend to anyone else that they start at 0.2mg.)

    other resources: https://slatestarcodex.com/2018/07/10/melatonin-much-more-than-you-wanted-to-know/, https://www.reddit.com/r/N24/comments/fylcmm/useful_links_n24_faq_and_software/ 

Replies from: elityre, Emrik North, rhollerith_dot_com
comment by Eli Tyre (elityre) · 2024-11-13T03:48:10.757Z · LW(p) · GW(p)

I predict this won't work as well as you hope because you'll be fighting the circadian effect that partially influences your cognitive performance.

Also, some ways to maximize your sleep quality are too exercise very intensely and/or to sauna, the day before.

comment by Emrik (Emrik North) · 2024-11-13T22:31:26.296Z · LW(p) · GW(p)

Heh, I've gone the opposite way and now do 3h sleep per 12h-days. The aim is to wake up during REM/light-sleep at the end of the 2nd sleep cycle, but I don't have a clever way of measuring this[1] except regular sleep-&-wake-times within the range of what the brain can naturally adapt its cycles to.

I think the objective should be to maximize the integral of cognitive readiness over time,[2] so here are some considerations (sorry for lack of sources; feel free to google/gpt; also also sorry for sorta redundant here, but I didn't wish to spend time paring it down):

  • Restorative effects of sleep have diminishing marginal returns
    • I think a large reason we sleep is that metabolic waste-clearance is more efficiently batch-processed, because optimal conditions for waste-clearance are way different from optimal conditions for cognition (and substantial switching-costs between, as indicated by how difficult it can be to actually start sleeping). And this differentially takes place during deep sleep.
    • Proportion of REM-sleep in a cycle increases per cycle, with a commensurate decrease in deep sleep (SWS).
      • Two unsourced illustrations I found in my notes:

        • Note how N3 (deep sleep) drops off fairly drastically after 3 hours (~2 full sleep cycles).
  • REM & SWS do different things, and I like the things SWS do more
    • Eg acetylcholine levels (ACh) are high during REM & awake, and low during SWS. ACh functions as a switch between consolidation & encoding of new memories.[3] Ergo REM is for exploring/generalizing novel patterns, and SWS is for consolidating/filtering them.
    • REM seems to differentially improve procedural memories, whereas SWS more for declarative memories.
      • (And who cares about procedural memories anyway. :p)
    • (My most-recent-pet-hunch is that ACh is required for integrating new episodic memories into hippocampal theta waves (via the theta-generating Medial Septum in the Cholinergic Basal Forebrain playing 'conductor' for the hippocampus), which is why you can't remember anything from deep sleep, and why drugs that inhibit ACh also prevent encoding new memories.)

So in summary, two (comparatively minor) reasons I like polyphasic short sleep is:

  • SWS differentially improves declarative over procedural memories.
  • Early cycles have proportionally more SWS.
  • Ergo more frequent shorter sleep sessions will maximize the proportion of sleep that goes to consolidation of declarative memories.
    • Note: I think the exploratory value of REM-sleep is fairly limited, just based on the personal observation that I mostly tend to dream about pleasant social situations, and much less about topics related to conceptual progress. I can explore much more efficiently while I'm awake.
    • Also, because I figure my REM-dreams are so socially-focused, I think more of it risks marginally aligning my daily motivations with myopically impressing others, at the cost of motivations aimed at more abstract/illegible/longterm goals.
    • (Although I would change my mind if only I could manage to dream of Maria [LW · GW] more, since trying to impress her is much more aligned [EA(p) · GW(p)] with our-best-guess about what saves the world compared to anything else.)
  • And because diminishing marginal returns to sleep-duration, and assuming cognition is best in the morning (anecdotally true), I maximize high-quality cognition by just... having more mornings preceded by what-best-I-can-tell-is-near-optimal-sleep (ceiling effect).
  • Lastly, just anecdotally, having two waking-sessions per 24h honestly just feels like I have ~twice the number of days in a week in terms of productivity. This is much more convincing to me than the above.
    • Starting mornings correctly seems to be incredibly important, and some of the effect of those good morning-starts dissipate the longer I spend awake. Mornings work especially well as hooks/cues for starting effective routines, sorta like a blank slate[4] you can fill in however I want if I can get the cues in before anything else has time to hijack the day's cognition/motivations.

      See my (outdated-but-still-maybe-inspirational) my morning routine [LW(p) · GW(p)].

      My mood is harder to control/predict in evenings due to compounding butterfly effects over the course of a day, and fewer natural contexts I can hook into with the right course-corrections before the day ends.
  1. ^

    Although waking up with morning-wood is some evidence of REM, but I don't know how reliable that is. ^^

  2. ^

    PedanticallyTechnically, we want to maximize brain-usefwlness over time, which in this case would be the integral of [[the distribution cognitive readiness over time] pointwise multiplied by [the distribution of brain-usefwlness over cognitive readiness]].

    This matters if, for example, you get disproportionately more usefwlness from the peaks of cognitive readiness, in which case you might want to sacrifice more median wake-time in order to get marginally more peak-time.

    I assume this is what your suggested strategy tries to do. However, I doubt it actually works, due to diminishing returns to marginal sleep time (and, I suspect,

  3. ^

    > "Our findings support the notion that ACh acts as a switch between modes of acquisition and consolidation." (2006)

    > "Acetylcholine Mediates Dynamic Switching Between Information Coding Schemes in Neuronal Networks" (2019)

  4. ^

    "Blank slate" I think caused by eg flushing neurotransmitters out of synaptic clefts (and maybe glucose and other mobile things), basically rebooting attentional selection-history [LW(p) · GW(p)], and thereby reducing recent momentum for whatever's influenced you short-term.

comment by RHollerith (rhollerith_dot_com) · 2024-11-13T06:50:36.821Z · LW(p) · GW(p)

A lot of people e.g. Andrew Huberman (who recommends many supplements for cognitive enhancement and other ends) recommend against supplementing melatonin except to treat insomnia that has failed to respond to many other interventions.

Replies from: quila
comment by quila · 2024-11-13T07:41:06.760Z · LW(p) · GW(p)

recommend against supplementing melatonin

why?

i searched Andrew Huberman melatonin and found this, though it looks like it may be an AI generated summary.

Replies from: rhollerith_dot_com
comment by RHollerith (rhollerith_dot_com) · 2024-11-13T08:09:05.932Z · LW(p) · GW(p)

The CNS contains dozens of "feedback loops". Any intervention that drastically alters the equilibrium point of several of those loops is generally a bad idea unless you are doing it to get out of some dire situation, e.g., seizures. That's my recollection of Huberman's main objection put into my words (because I dont recall his words).

Supplementing melatonin is fairly unlikely to have (much of) a permanent effect on the CNS, but you can waste a lot of time by temporarily messing up CNS function for the duration of the melatonin supplementation (because a person cannot make much progress in life with even a minor amount of messed-up CNS function).

A secondary consideration is that melatonin is expensive to measure quantitatively, so the amount tends to vary a lot from what is on the label. In particular, there are reputational consequences and possible legal consequences to a brand's having been found to have less than the label says, so brands tend to err on the side of putting too much melatonin in per pill, which ends up often being manyfold more than the label says.

There are many better ways to regularize the sleep rhythm. My favorite is ensuring I get almost no light at night (e.g., having foil on the windows of the room I sleep in) but then get the right kind of light in the morning, which entails understanding how light affects the intrinsically photosensitive retinal ganglion cells and how those cells influence the circadian rhythm. In fact, I'm running my screens (computer screen and iPad screen) in grayscale all day long to prevent yellow-blue contrasts on the screen from possibly affecting my circadian rhythm. I also use magnesium and theanine according to a complex protocol of my own devising.

comment by quila · 2024-10-06T18:20:18.409Z · LW(p) · GW(p)

i don't think having (even exceptionally) high baseline intelligence and then studying bias avoidance techniques is enough for one to be able to derive an alignment solution. i have not seen in any rationalist i'm aware of what feels like enough for that, though their efforts are virtuous of course. it's just that the standard set by the universe seems higher.

i think this is a sort of background belief for me. not failing at thinking is the baseline; other needed computations are harder. they are not satisfied by avoiding failure conditions, but require the satisfaction of some specific, hard-to-find success condition. learning about human biases will not train one to cognitively seek answers of this kind, only to avoid premature failure.

this is basically a distinction between rationality and creativity. rationality[1] is about avoiding premature failure, creativity is about somehow generating new ideas.

but there is not actually something which will 'guide us through' creativity, like hpmor/the sequences do for rationality. there are various scattered posts about it[2].

i also do not have a guide to creativity to share with you. i'm only pointing at it as an equally if not more important thing.

if there is an art for creativity in the sense of narrow-solution-seeking, then where is it? somewhere in books buried deep in human history? if there is not yet an art, please link more scattered posts or comment new thoughts if you have any.

  1. ^

    (as i perceive it, though sometimes i see what i'd call creativity advice considered a part of rationality - doesn't matter)

  2. ^
Replies from: Richard_Kennaway
comment by Richard_Kennaway · 2024-10-07T06:52:03.783Z · LW(p) · GW(p)

i also do not have a guide to creativity to share with you.

I do. Edward de Bono’s oeuvre is all about this, beginning with the work that brought him to public notice and coined an expression that I think most people do not know the origin of these days, “Lateral Thinking”. He and lateral thinking were famous back in the day, but have faded from public attention since. He has been mentioned before on LessWrong, but only a handful of times.

There are also a few individual works, such as “Oblique Strategies” and TRIZ.

The “Draftsmen” podcast by two artists/art instructors contains several episodes on the subject. These are specific to the topic of making art, which was my interest in watching the series, but the ideas may generalise.

One can uncreatively google “how to be creative” and get a ton of hits, although from eyeballing them I expect most to be fairly trite.

Replies from: quila
comment by quila · 2024-10-07T09:25:23.685Z · LW(p) · GW(p)

The “Draftsmen” podcast by two artists/art instructors contains several episodes on the subject

i am an artist as well :). i actually doubt for most artists that they could give much insight here; i think that usually artist creativity, and also mathematician creativity etc, human creativity, is of the default, mysterious kind, that we don't know where it comes from / it 'just happens', like intuitions, thoughts, realizations do - it's not actually fundamentally different from those even, just called 'creativity' more often in certain domains like art.

Replies from: Richard_Kennaway
comment by Richard_Kennaway · 2024-10-07T09:31:55.817Z · LW(p) · GW(p)

The sources I listed are all trying to demystify it, Edward de Bono explicitly so. They are saying, there are techniques, methods, and tools for coming up with new ideas, just as the Sequences are saying, there are techniques, methods, and tools for judging ideas so as to approach the truth of things.

In creativity, there is no recipe with which you can just crank the handle and it will spit out the right idea, but neither is there in rationality a recipe with which you can just crank the handle and come up with a proof of a conjecture.

Replies from: quila
comment by quila · 2024-10-07T09:53:19.760Z · LW(p) · GW(p)

yep not contesting any of that

neither is there in rationality a recipe with which you can just crank the handle and come up with a proof of a conjecture

to be clear, coming up with proofs is a central example of what i meant by creativity. ("they are not satisfied by avoiding failure conditions, but require the satisfaction of some specific, hard-to-find success condition")

comment by quila · 2024-06-01T10:14:03.444Z · LW(p) · GW(p)

i currently believe that working on superintelligence-alignment is likely the correct choice from a fully-negative-utilitarian perspective.[1]

for others, this may be an intuitive statement or unquestioned premise. for me it is not, and i'd like to state my reasons for believing it, partially as a response to this post concerned about negative utilitarians trying to accelerate progress towards an unaligned-ai-takeover [LW(p) · GW(p)].

there was a period during which i was more uncertain about this question, and avoided openly sharing minimally-dual-use [LW · GW] alignment research (but did not try to accelerate progress towards a nonaligned-takeover) while resolving that uncertainty.

a few relevant updates since then:

  1. decrease on the probability that the values an aligned AI would have would endorse human-caused moral catastrophes such as human-caused animal suffering.

    i did not automatically believe humans to be good-by-default, and wanted to take time to seriously consider what i think should be a default hypothesis-for-consideration upon existing in a society that generally accepts an ongoing mass torture event.
  2. awareness of vastly worse possible s-risks.

    factory farming is a form of physical torture, by which i mean torture of a mind which is done through the indirect route of effecting its input channels (body/senses). it is also a form of psychological torture. it is very bad, but situations which are magnitudes worse seem possible, where a mind is modulated directly (on the neuronal level) and fully.

    compared to 'in-distribution suffering' (eg animal suffering, human-social conflicts), i find it further less probable that an AI aligned to some human-specified values[2] would create a future with this.

    i think it's plausible that it exists rarely in other parts of the world, though, and if so would be important to prevent through acausal trade if we can.
  3. (also see Kaj Sotala's reply [LW(p) · GW(p)] about some plausible incidental s-risks)

i am not free of uncertainty about the topic, though.

in particular, if disvalue of suffering is common across the world, such that the suffering which can be reduced through acausal trade will be reduced through acausal trade regardless of whether we create an AI which disvalues suffering, then it would no longer be the case that working on alignment is the best decision for a purely negative utilitarian.

despite this uncertainty, my current belief is that the possibility of reducing suffering via acausal trade (including possibly such really-extreme forms of suffering) outweighs the probability and magnitude of human-aligned-AI-caused suffering.[3]

also, to be clear, if it ever seems that an actualized s-risk takeover event is significantly more probable than it seems now[4] as a result of unknown future developments, i would fully endorse causing a sooner unaligned-but-not-suffering takeover to prevent it.

  1. ^

    i find it easier to write this post as explaining my position as "even for a pure negative utilitarian, i think it's the correct choice", because it lets us ignore individual differences in how much moral weight is assigned to suffering relative to everything else.

    i think it's pretty improbable that i would, on 'idealized reflection'/CEV [? · GW], endorse total-negative-utilitarianism (which has been classically pointed out as implying, e.g, preferring a universe with nothing to a universe containing a robust utopia plus an instance of light suffering).

    i self-describe as a "suffering-focused altruist" or "negative-leaning-utilitarian." ie, suffering seems much worse to me than happiness seems good.

  2. ^

    (though certainly there are some individual current humans would do this, for example to digital minds, if given the ability to do so. rather, i'm expressing a belief that it's very probable that an aligned AI which practically results from this situation would not allow that to happen.)

  3. ^

    (by 'human-aligned AI', I mean one pointed [LW · GW] to an actual CEV of one or a group of humans (which could indirectly imply the 'CEV of everyone' but without actually-not-being-that and failing in the below way, and without allowing cruel values of some individuals to enter into it).

    I don't mean an AI aligned to some sort of 'current institutional process', like voting, involving all living humans -- I think that should be avoided due to politicization risk and potential for present/unreflective(/by which i mean cruel)-values lock-in.)

  4. ^

    there's some way to formalize with bayes equations how likely, from a negative-utilitarian perspective, an s-risk needs to be (relative to a good outcome) to terminate a timeline.

    it would intake probability distributions related to 'the frequency of suffering-disvalue across the universal distribution of ASIs' and 'the frequency of various forms of s-risks that are preventable with acausal trade'. i might create this formalization later.

    if we think there's pretty certainly more preventable-through-trade-type suffering-events than there is altruistic ASIs to prevent it, a local preventable-type s-risk might actually need to be 'more likely than the good/suffering-disvaluing outcome'

Replies from: carado-1, Kaj_Sotala
comment by Tamsin Leake (carado-1) · 2024-06-01T11:16:01.434Z · LW(p) · GW(p)

single-use

Considering how loog it took me to get that by this you mean "not dual-use", I expect some others just won't get it.

comment by Kaj_Sotala · 2024-06-01T11:14:59.691Z · LW(p) · GW(p)

You may find Superintelligence as a Cause or Cure for Risks of Astronomical Suffering of interest; among other things, it discusses s-risks that might come about from having unaligned AGI.

Superintelligence is related to three categories of suffering risk: suffering subroutines (Tomasik 2017), mind crime (Bostrom 2014) and flawed realization (Bostrom 2013).

5.1 Suffering subroutines

Humans have evolved to be capable of suffering, and while the question of which other animals are conscious or capable of suffering is controversial, pain analogues are present in a wide variety of animals. The U.S. National Research Council’s Committee on Recognition and Alleviation of Pain in Laboratory Animals (2004) argues that, based on the state of existing evidence, at least all vertebrates should be considered capable of experiencing pain.

Pain seems to have evolved because it has a functional purpose in guiding behavior: evolution having found it suggests that pain might be the simplest solution for achieving its purpose. A superintelligence which was building subagents, such as worker robots or disembodied cognitive agents, might then also construct them in such a way that they were capable of feeling pain—and thus possibly suffering (Metzinger 2015)—if that was the most efficient way of making them behave in a way that achieved the superintelligence’s goals.

Humans have also evolved to experience empathy towards each other, but the evolutionary reasons which cause humans to have empathy (Singer 1981) may not be relevant for a superintelligent singleton which had no game-theoretical reason to empathize with others. In such a case, a superintelligence which had no disincentive to create suffering but did have an incentive to create whatever furthered its goals, could create vast populations of agents which sometimes suffered while carrying out the superintelligence’s goals. Because of the ruling superintelligence’s indifference towards suffering, the amount of suffering experienced by this population could be vastly higher than it would be in e.g. an advanced human civilization, where humans had an interest in helping out their fellow humans.

Depending on the functional purpose of positive mental states such as happiness, the subagents might or might not be built to experience them. For example, Fredrickson (1998) suggests that positive and negative emotions have differing functions. Negative emotions bias an individual’s thoughts and actions towards some relatively specific response that has been evolutionarily adaptive: fear causes an urge to escape, anger causes an urge to attack, disgust an urge to be rid of the disgusting thing, and so on. In contrast, positive emotions bias thought-action tendencies in a much less specific direction. For example, joy creates an urge to play and be playful, but “play” includes a very wide range of behaviors, including physical, social, intellectual, and artistic play. All of these behaviors have the effect of developing the individual’s skills in whatever the domain. The overall effect of experiencing positive emotions is to build an individual’s resources—be those resources physical, intellectual, or social.

To the extent that this hypothesis were true, a superintelligence might design its subagents in such a way that they had pre-determined response patterns for undesirable situations, so exhibited negative emotions. However, if it was constructing a kind of a command economy in which it desired to remain in control, it might not put a high value on any subagent accumulating individual resources. Intellectual resources would be valued to the extent that they contributed to the subagent doing its job, but physical and social resources could be irrelevant, if the subagents were provided with whatever resources necessary for doing their tasks. In such a case, the end result could be a world whose inhabitants experienced very little if any in the way of positive emotions, but did experience negative emotions. [...]

5.2 Mind crime

A superintelligence might run simulations of sentient beings for a variety of purposes. Bostrom (2014, p. 152) discusses the specific possibility of an AI creating simulations of human beings which were detailed enough to be conscious. These simulations could then be placed in a variety of situations in order to study things such as human psychology and sociology, and be destroyed afterwards.

The AI could also run simulations that modeled the evolutionary history of life on Earth in order to obtain various kinds of scientific information, or to help estimate the likely location of the “Great Filter” (Hanson 1998) and whether it should expect to encounter other intelligent civilizations. This could repeat the wildanimal suffering (Tomasik 2015, Dorado 2015) experienced in Earth’s evolutionary history. The AI could also create and mistreat, or threaten to mistreat, various minds as a way to blackmail other agents. [...]

5.3 Flawed realization

A superintelligence with human-aligned values might aim to convert the resources in its reach into clusters of utopia, and seek to colonize the universe in order to maximize the value of the world (Bostrom 2003a), filling the universe with new minds and valuable experiences and resources. At the same time, if the superintelligence had the wrong goals, this could result in a universe filled by vast amounts of disvalue.

While some mistakes in value loading may result in a superintelligence whose goal is completely unlike what people value, certain mistakes could result in flawed realization (Bostrom 2013). In this outcome, the superintelligence’s goal gets human values mostly right, in the sense of sharing many similarities with what we value, but also contains a flaw that drastically changes the intended outcome.

For example, value-extrapolation (Yudkowsky 2004) and value-learning (Soares 2016, Sotala 2016) approaches attempt to learn human values in order to create a world that is in accordance with those values.

There have been occasions in history when circumstances that cause suffering have been defended by appealing to values which seem pointless to modern sensibilities, but which were nonetheless a part of the prevailing values at the time. In Victorian London, the use of anesthesia in childbirth was opposed on the grounds that being under the partial influence of anesthetics may cause “improper” and “lascivious” sexual dreams (Farr 1980), with this being considered more important to avoid than the pain of childbirth.

A flawed value-loading process might give disproportionate weight to historical, existing, or incorrectly extrapolated future values whose realization then becomes more important than the avoidance of suffering. Besides merely considering the avoidance of suffering less important than the enabling of other values, a flawed process might also tap into various human tendencies for endorsing or celebrating cruelty (see the discussion in section 4), or outright glorifying suffering. Small changes to a recipe for utopia may lead to a future with much more suffering than one shaped by a superintelligence whose goals were completely different from ours.

Replies from: quila
comment by quila · 2024-06-01T12:10:11.093Z · LW(p) · GW(p)

thanks for sharing. here's my thoughts on the possibilities in the quote.

Suffering subroutines - maybe 10-20% likely. i don't think suffering reduces to "pre-determined response patterns for undesirable situations," because i can think of simple algorithmic examples of that which don't seem like suffering.

suffering feels like it's about the sense of aversion/badness (often in response a situation), and not about the policy "in <situation>, steer towards <new situation>". (maybe humans were instilled with a policy of steering away from 'suffering' states generally, and that's why evolution made us enter those states in some types of situation?). (though i'm confused about what suffering really is)

i would also give the example of positive-feeling emotions sometimes being narrowly directed. for example, someone can feel 'excitement/joy' about a gift or event and want to <go to/participate in> it. sexual and romantic subroutines can also be both narrowly-directed and positive-feeling. though these examples lack the element of a situation being steered away from, vs steering (from e.g any neutral situation) towards other ones.

Suffering simulations - seems likely (75%?) for the estimation of universal attributes, such as the distribution of values. my main uncertainty is about whether there's some other way for the ASIs to compute that information which is simple enough to be suffering free. this also seems lower magnitude than other classes, because (unless it's being calculated indefinetely for ever-greater precision) this computation terminates at some point, rather than lasting until heat death (or forever if it turns out that's avoidable).

Blackmail - i don't feel knowledgeable enough about decision theory to put a probability on this one, but in the case where it works (or is precommitted to under uncertainty in hopes that it works), it's unfortunately a case where building aligned ASI would incentive unaligned entities to do it.

Flawed realization - again i'm too uncertain about what real-world paths lead to this, but intuitively, it's worryingly possible if the future contains LLM-based LTPAs (long term planning agents) intelligent enough to solve alignment and implement their own (possibly simulated [LW · GW]) 'values'.

Replies from: Kaj_Sotala
comment by Kaj_Sotala · 2024-06-01T14:50:34.004Z · LW(p) · GW(p)

Suffering subroutines - maybe 10-20% likely. i don't think suffering reduces to "pre-determined response patterns for undesirable situations," because i can think of simple algorithmic examples of that which don't seem like suffering.

Yeah, I agree with this to be clear. Our intended claim wasn't that just "pre-determined response patterns for undesirable situations" would be enough for suffering. Actually, there were meant to be two separate claims, which I guess we should have distinguished more clearly:

1) If evolution stumbled on pain and suffering, those might be relatively easy and natural ways to get a mind to do something. So an AGI that built other AGIs might also build them to experience pain and suffering (that it was entirely indifferent to), if that happened to be an effective motivational system.

2) If this did happen, then there's also some speculation suggesting that an AI that wanted to stay in charge might not want to give its worker AGIs things much in the way of things that looked like positive emotions, but did have a reason to give them things that looked like negative emotions. Which would then tilt the balance of pleasure vs. pain in the post-AGI world much more heavily in favor of (emotional) pain.

Now the second claim is much more speculative and I don't even know if I'd consider it a particularly likely scenario (probably not); we just put it in since much of the paper was just generally listing various possibilities of what might happen. But the first claim - that since all the biological minds we know of seem to run on something like pain and pleasure, we should put a substantial probability on AGI architectures also ending up with something like that - seems much stronger to me.

comment by quila · 2024-05-08T03:48:29.660Z · LW(p) · GW(p)

On Pivotal Acts

(edit: status: not a crux, instead downstream of different beliefs about what the first safe ASI will look like in predicted futures where it exists [LW(p) · GW(p)]. If I instead believed 'task-aligned superintelligent agents' were the most feasible form of pivotally useful AI, I would then support their use for pivotal acts.)

I was rereading some of the old literature on alignment research sharing policies after Tamsin Leake's recent post [LW · GW] and came across some discussion of pivotal acts [LW · GW] as well.

Hiring people for your pivotal act project is going to be tricky. [...] People on your team will have a low trust and/or adversarial stance towards neighboring institutions and collaborators, and will have a hard time forming good-faith collaboration. This will alienate other institutions and make them not want to work with you or be supportive of you.

This is in a context where the 'pivotal act' example is using a safe ASI to shut down all AI labs.[1]

My thought is that I don't see why a pivotal act needs to be that. I don't see why shutting down AI labs or using nanotech to disassemble GPUs on Earth [LW · GW] would be necessary. These may be among the 'most direct' or 'simplest to imagine' possible actions, but in the case of superintelligence, simplicity is not a constraint.

We can instead select for the 'kindest' or 'least adversarial' or actually: functional-decision-theoretically [? · GW] optimal actions that save the future while minimizing the amount of adversariality this creates in the past (present).

Which can be broadly framed as 'using ASI for good'. Which is what everyone wants, even the ones being uncareful about its development.

Capabilities orgs would be able to keep working on fun capabilities projects in those days during which the world is saved, because a group following this policy would choose to use ASI to make the world robust to the failure modes of capabilities projects rather than shutting them down. Because superintelligence is capable of that, and so much more.

  1. ^

    side note: It's orthogonal to the point of this post, but this example also makes me think: if I were working on a safe ASI project, I wouldn't mind if another group who had discreetly built safe ASI used it to shut my project down, since my goal is 'ensure the future lightcone is used in a valuable, tragedy-averse way' and not 'gain personal power' or 'have a fun time working on AI' or something. In my morality, it would be naive to be opposed to that shutdown. But to the extent humanity is naive, we can easily do something else in that future to create better present dynamics (as the maintext argues).

    If there is a group for whom using ASI to make the world robust to risks and free of harm, in a way where its actions don't infringe on ongoing non-violent activities is problematic, then this post doesn't apply to them as their issue all along was not with the character of the pivotal act, but instead possibly with something like 'having my personal cosmic significance as a capabilities researcher stripped away by the success of an external alignment project'.

    Another disclaimer: This post is about a world in which safely usable superintelligence has been created, but I'm not confident that anyone (myself included) currently has a safe and ready method to create it with. This post shouldn't be read as an endorsement of possible current attempts to do this. I would of course prefer if this civilization were one which could coordinate such that no groups were presently working on ASI, precluding this discourse.

Replies from: Wei_Dai, MakoYass, Vladimir_Nesov, mesaoptimizer
comment by Wei Dai (Wei_Dai) · 2024-05-08T06:58:48.424Z · LW(p) · GW(p)

These may be among the ‘most direct’ or ‘simplest to imagine’ possible actions, but in the case of superintelligence, simplicity is not a constraint.

I think it is considered a constraint by some because they think that it would be easier/safer to use a superintelligent AI to do simpler actions, while alignment is not yet fully solved. In other words, if alignment was fully solved, then you could use it to do complicated things like what you suggest, but there could be an intermediate stage of alignment progress where you could safely use SI to do something simple like "melt GPUs" but not to achieve more complex goals.

Replies from: quila
comment by quila · 2024-05-08T07:12:55.476Z · LW(p) · GW(p)

it is considered a constraint by some because they think that it would be easier/safer to use a superintelligent AI to do simpler actions, while alignment is not yet fully solved

Agreed that some think this, and agreed that formally specifying a simple action policy is easier than a more complex one.[1] 

I have a different model of what the earliest safe ASI will look like, in most futures where one exists. Rather than a 'task-aligned' agent, I expect it to be a non-agentic system which can be used to e.g come up with pivotal actions for the human group to take / information to act on.[2]

  1. ^

    although formal 'task-aligned agency' seems potentially more complex than the attempt at a 'full' outer alignment solution that I'm aware of (QACI), as in specifying what a {GPU, AI lab, shutdown of an AI lab} is seems more complex than it.

  2. ^

    I think these systems are more attainable, see this post [LW · GW] to possibly infer more info (it's proven very difficult for me to write in a way that I expect will be moving to people who have a model focused on 'formal inner + formal outer alignment', but I think evhub has done so well).

Replies from: quila
comment by quila · 2024-05-08T09:35:04.625Z · LW(p) · GW(p)

Reflecting on this more, I wrote in a discord server (then edited to post here):

I wasn't aware the concept of pivotal acts was entangled with the frame of formal inner+outer alignment as the only (or only feasible?) way to cause safe ASI.

I suspect that by default, I and someone operating in that frame might mutually believe each others agendas to be probably-doomed. This could make discussion more valuable (as in that case, at least one of us should make a large update).

For anyone interested in trying that discussion, I'd be curious what you think of the post linked above [LW · GW]. As a comment on it says:

I found myself coming back to this now, years later, and feeling like it is massively underrated. Idk, it seems like the concept of training stories is great and much better than e.g. "we have to solve inner alignment and also outer alignment" or "we just have to make sure it isn't scheming."

In my view, solving formal inner alignment, i.e. devising a general method to create ASI with any specified output-selection policy, is hard enough that I don't expect it to be done.[1] This is why I've been focusing on other approaches which I believe are more likely to succeed.

 

  1. ^

    Though I encourage anyone who understands the problem and thinks they can solve it to try to prove me wrong! I can sure see some directions and I think a very creative human could solve it in principle. But I also think a very creative human might find a different class of solution that can be achieved sooner. (Like I've been trying to do :)

comment by mako yass (MakoYass) · 2024-05-08T06:57:15.749Z · LW(p) · GW(p)

Imagining a pivotal act of generating very convincing arguments for like voting and parliamentary systems that would turn government into 1) an working democracy 2) that's capable of solving the problem. Citizens and congress read arguments, get fired up, problem is solved through proper channels.

comment by Vladimir_Nesov · 2024-05-08T18:29:02.058Z · LW(p) · GW(p)

See minimality principle:

the least dangerous plan is not the plan that seems to contain the fewest material actions that seem risky in a conventional sense, but rather the plan that requires the least dangerous cognition from the AGI executing it

comment by mesaoptimizer · 2024-05-08T06:55:54.803Z · LW(p) · GW(p)

My thought is that I don’t see why a pivotal act needs to be that.

Okay. Why do you think Eliezer proposed that, then?

Replies from: quila
comment by quila · 2024-05-08T07:14:19.195Z · LW(p) · GW(p)

(see reply to Wei Dai)

comment by quila · 2024-06-02T14:40:10.253Z · LW(p) · GW(p)

edit: i think i've received enough expressions of interest (more would have diminishing value but you're still welcome to), thanks everyone!

i recall reading in one of the MIRI posts that Eliezer believed a 'world model violation' would be needed for success to be likely.

i believe i may be in possession of such a model violation and am working to formalize it, where by formalize i mean write in a way that is not 'hard-to-understand intuitions' but 'very clear text that leaves little possibility for disagreement once understood'. it wouldn't solve the problem, but i think it would make it simpler so that maybe the community could solve it.

if you'd be interested in providing feedback on such a 'clearly written version', please let me know as a comment or message.[1] (you're not committing to anything by doing so, rather just saying "im a kind of person who would be interested in this if your claim is true"). to me, the ideal feedback is from someone who can look at the idea under 'hard' assumptions (of the type MIRI has) about the difficulty of pointing an ASI, and see if the idea seems promising (or 'like a relevant model violation') from that perspective.

  1. ^

    i don't have many contacts in the alignment community

Replies from: Seth Herd, quetzal_rainbow, D0TheMath, None
comment by Seth Herd · 2024-06-02T15:31:36.879Z · LW(p) · GW(p)

I'm game! We should be looking for new ideas, so I'm happy to look at yours and provide feedback.

comment by quetzal_rainbow · 2024-06-02T17:41:22.207Z · LW(p) · GW(p)

Consider me in

comment by Garrett Baker (D0TheMath) · 2024-06-02T16:49:58.199Z · LW(p) · GW(p)

Historically I’ve been able to understand others’ vague ideas & use them in ways they endorse. I can’t promise I’ll read what you send me, but I am interested.

comment by [deleted] · 2024-06-02T21:27:07.451Z · LW(p) · GW(p)

Maybe you can say a bit about what background someone should have to be able to evaluate your idea.

comment by quila · 2024-05-20T04:17:46.208Z · LW(p) · GW(p)

A quote from an old Nate Soares post that I really liked:

It is there, while staring the dark world in the face, that I find a deep well of intrinsic drive. It is there that my resolve and determination come to me, rather than me having to go hunting for them.

I find it amusing that "we need lies because we can't bear the truth" is such a common refrain, given how much of my drive stems from my response to attempting to bear the truth.

I find that it's common for people to tell themselves that they need the lies in order to bear reality. In fact, I bet that many of you can think of one thing off the top of your heads that you're intentionally tolerifying, because the truth is too scary to even consider. (I've seen at least a dozen failed relationships dragged out for months and months due to this effect.)

I say, if you want the intrinsic drive, drop the illusion. Refuse to tolerify. Face the facts that you feared you would not be able to handle. You are likely correct that they will be hard to bear, and you are likely correct that attempting to bear them will change you. But that change doesn't need to break you. It can also make you stronger, and fuel your resolve.

So see the dark world. See everything intolerable. Let the urge to tolerify it build, but don't relent. Just live there in the intolerable world, refusing to tolerate it. See whether you feel that growing, burning desire to make the world be different. Let parts of yourself harden. Let your resolve grow. It is here, in the face of the intolerable, that you will be able to tap into intrinsic motivation.

comment by quila · 2024-05-16T16:01:52.097Z · LW(p) · GW(p)

(Personal) On writing and (not) speaking

I often struggle to find words and sentences that match what I intend to communicate.

Here are some problems this can cause:

  1. Wordings that are odd or unintuitive to the reader, but that are at least literally correct.[1]
  2. Not being able express what I mean, and having to choose between not writing it, or risking miscommunication by trying anyways. I tend to choose the former unless I'm writing to a close friend. Unfortunately this means I am unable to express some key insights to a general audience.
  3. Writing taking lots of time: I usually have to iterate many times on words/sentences until I find one which my mind parses as referring to what I intend. In the slowest cases, I might finalize only 2-10 words per minute. Even after iterating, my words are still sometimes interpreted in ways I failed to foresee.

These apply to speaking, too. If I speak what would be the 'first iteration' of a sentence, there's a good chance it won't create an interpretation matching what I intend to communicate. In spoken language I have no chance to constantly 'rewrite' my output before sending it. This is one reason, but not the only reason, that I've had a policy of trying to avoid voice-based communication.

I'm not fully sure what caused this relationship to language. It could be that it's just a byproduct of being autistic. It could also be a byproduct of out-of-distribution childhood abuse.[2]

  1. ^

    E.g., once I couldn't find the word 'clusters,' and wrote a complex sentence referring to 'sets of similar' value functions each corresponding to a common alignment failure mode / ASI takeoff training story. (I later found a way to make it much easier to read)

  2. ^

    (Content warning)

    My primary parent was highly abusive, and would punish me for using language in the intuitive 'direct' way about particular instances of that. My early response was to try to euphemize and say-differently in a way that contradicted less the power dynamic / social reality she enforced.

    Eventually I learned to model her as a deterministic system and stay silent / fawn.

Replies from: Emrik North, quila, quila, weightt-an
comment by Emrik (Emrik North) · 2024-05-16T22:35:07.353Z · LW(p) · GW(p)

Aaron Bergman has a vid of himself typing new sentences in real-time, which I found really helpfwl.[1] I wish I could watch lots of people record themselves typing, so I could compare what I do.

Being slow at writing can be sign of failure or winning, depending on the exact reasons why you're slow. I'd worry about being "too good" at writing, since that'd be evidence that your brain is conforming your thoughts to the language, instead of conforming your language to your thoughts. English is just a really poor medium for thought (at least compared to e.g. visuals and pre-word intuitive representations), so it's potentially dangerous to care overmuch about it.

  1. ^

    Btw, Aaron is another person-recommendation. He's awesome. Has really strong self-insight, goodness-of-heart, creativity. (Twitter profile, blog+podcast, EAF [EA · GW], links.) I haven't personally learned a whole bunch from him yet,[2] but I expect if he continues being what he is, he'll produce lots of cool stuff which I'll learn from later.

  2. ^

    Edit: I now recall that I've learned from him: screwworms (important), and the ubiquity of left-handed chirality in nature (mildly important). He also caused me to look into two-envelopes paradox, which was usefwl for me.

    Although I later learned about screwworms from Kevin Esvelt at 80kh podcast, so I would've learned it anyway. And I also later learned about left-handed chirality from Steve Mould on YT, but I may not have reflected on it as much.

Replies from: aaron-bergman, quila
comment by Aaron Bergman (aaron-bergman) · 2024-05-17T09:50:28.062Z · LW(p) · GW(p)

Thank you, that is all very kind! ☺️☺️☺️

I expect if he continues being what he is, he'll produce lots of cool stuff which I'll learn from later.

I hope so haha

comment by quila · 2024-05-16T22:59:26.828Z · LW(p) · GW(p)

Record yourself typing?

Replies from: Emrik North
comment by Emrik (Emrik North) · 2024-05-17T03:43:52.706Z · LW(p) · GW(p)

EDIT: I uploaded a better example here (18m18s):

 

Old example still here (7m25s).

Replies from: Emrik North
comment by Emrik (Emrik North) · 2024-07-30T04:08:38.928Z · LW(p) · GW(p)

Ah, most relevant: Paul Graham has a recording-of-sorts of himself writing a blog post "Startups in 13 sentences".

comment by quila · 2024-07-12T00:15:53.402Z · LW(p) · GW(p)

I think I've become better at writing clearly, relative to before. Part of it is just practice. A lesson that also feels relevant:

Write very precisely, meaning there is no non-trivial space of possible interpretations that you don't intend. Unless you do this, people may not respond to what you really mean, even if you consider it obvious.

comment by quila · 2024-05-30T22:36:56.208Z · LW(p) · GW(p)

Maybe someone has advice for finalizing-writing faster (not at the expense of clarity)? I think I can usually end up with something that's clear, at least if it's just a basic point that's compatible with the reader's ontology, but it still takes a long time.

comment by weightt an (weightt-an) · 2024-05-16T19:11:55.166Z · LW(p) · GW(p)

Even after iterating, my words are often interpreted in ways I failed to foresee.

It's also partially the problem with the recipient of communicated message. Sometimes you both have very different background assumptions/intuitive understandings. Sometimes it's just skill issue and the person you are talking to is bad at parsing and all the work of keeping the discussion on the important things / away from trivial undesirable sidelines is left to you.

Certainly it's useful to know how to pick your battles and see if this discussion/dialogue is worth what you're getting out of it at all.

comment by quila · 2024-12-21T18:46:35.601Z · LW(p) · GW(p)

i observe that processes seem to have a tendency towards what i'll call "surreal equilibria". [status: trying to put words to a latent concept. may not be legible, feel free to skip. partly 'writing as if i know the reader will understand' so i can write about this at all. maybe it will interest some.]

progressively smaller-scale examples:

  • it's probably easiest to imagine this with AI neural nets, procedurally following some adapted policy even as the context changes from the one they grew in. if these systems have an influential, hard to dismantle role, then they themselves become the rules governing the progression of the system for whatever arises next, themselves ceasing to be the actors or components they originally were; yet as they are "intelligent" they still emit the words as if the old world is true; they become simulacra, the automatons keep moving as they were, this is surreal. going out with a whimper [LW · GW].
  • early → late-stage capitalism. early → late-stage democracy.
    • structures which became ingrained as rules of the world. note the difference between "these systems have Naturally Changed from an early to late form" and "these systems became persistent constraints, and new adapted optimizers sprouted within them".

it looks like i'm trying to describe an iterative pattern of established patterns becoming constraints bearing permanent resemblance to what they were, and new things sprouting up within the new context / constrained world, eventually themselves becoming constraints.[1]

i also had in mind smaller scale examples.

  • a community forms around some goal and decides to moderate and curate itself in some consistent way, hoping this will lead to good outcomes; eventually the community is no longer the thing it set out to be; the original principles became the constraints. (? - not sure how much this really fits)
  • a group of internet friends agrees to regularly play a forum game but eventually they're each just 'going along with it', no longer passionate about the game itself. "continuing to meet to do the thing" was a policy and stable meta-pattern that continued beyond its original context. albeit in this case it was an easily disrupted pattern. but for a time it led to a kind of deadness in behavior, me and those friends became surreal?
    • this is possibly a stretch from what i was originally describing. i'm just sampling words from my mind, here, and hoping they correlate to the latent which i wanted to put words to.

this feels related to goodhart, but where goodhart is framed more individually, and this is more like "a learned policy and its original purpose coming apart as a tendency of reality".

  1. ^

    tangential: in this frame physics can be called the 'first constraint'

Replies from: Viliam, tailcalled
comment by Viliam · 2024-12-22T21:43:07.311Z · LW(p) · GW(p)

The most likely/frequent outcome of "trying to build something that will last" is failure. You tried to build an AI, but it doesn't work. You tried to convince people that trade is better than violence, they cooked you for dinner. You tried to found a community, no one was interested. A group of friends couldn't decide when and where to meet.

But if you succeed to... create a pattern that keeps going on... then the thing you describe is the second most likely outcome. It turns out that your initial creation had parts that were easier or harder to replicate, and the easier ones keep going and growing, and the harder ones gradually disappear. The fluffy animal died, but its skeleton keeps walking.

It's like casting an animation spell on a thing, and finding out that the spell only affects certain parts of the thing, if any.

comment by tailcalled · 2024-12-21T21:13:05.627Z · LW(p) · GW(p)

I would distinguish two variants of this. There's just plain inertia, like if you have a big pile of legacy code that accumulated from a lot of work, then it takes a commensurate amount of work to change it. And then there's security, like a society needs rules to maintain itself against hostile forces. The former is sort of accidentally surreal, whereas the latter is somewhat intentionally so, in that a tendency to re-adapt would be a vulnerability.

comment by quila · 2023-12-06T20:39:14.727Z · LW(p) · GW(p)

Here's a tampermonkey script that hides the agreement score on LessWrong. I wasn't enjoying this feature because I don't want my perception to be influenced by that; I want to judge purely based on ideas, and on my own.

Here's what it looks like:

// ==UserScript==
// @name         Hide LessWrong Agree/Disagree Votes
// @namespace    http://tampermonkey.net/
// @version      1.0
// @description  Hide agree/disagree votes on LessWrong comments.
// @author       ChatGPT4
// @match        https://www.lesswrong.com/*
// @grant        none
// ==/UserScript==

(function() {
    'use strict';

    // Function to hide agree/disagree votes
    function hideVotes() {
        // Select all elements representing agree/disagree votes
        var voteElements = document.querySelectorAll('.AgreementVoteAxis-voteScore');

        // Loop through each element and hide it
        voteElements.forEach(function(element) {
            element.style.display = 'none';
        });
    }

    // Run the function when the page loads
    hideVotes();

    // Optionally, set up a MutationObserver to hide votes on dynamically loaded content
    var observer = new MutationObserver(function() {
        hideVotes();
    });

    // Start observing the document for changes
    observer.observe(document, { childList: true, subtree: true });
})();
Replies from: mir-anomaly
comment by Mir (mir-anomaly) · 2024-01-02T13:17:39.182Z · LW(p) · GW(p)

I don't know the full original reasoning for why they introduced it, but one hope is that it marginally disentangles agreement from the main voting axis. People who were going to upvote based purely on agreement will now put their vote in the agreement axis instead (is the hope, anyway). Agreement-voting is socioepistemologically bad in general (except for in polls), so this seems good.

comment by quila · 2024-07-25T09:02:50.110Z · LW(p) · GW(p)

I was looking at this image in a post [LW · GW] and it gave me some (loosely connected/ADD-type) thoughts.

In order:

  1. The entities outside the box look pretty scary.
  2. I think I would get over that quickly, they're just different evolved body shapes. The humans could seem scary-looking from their pov too.
  3. Wait.. but why would the robots have those big spiky teeth? (implicit question: what narratively coherent world could this depict?)
  4. Do these forms have qualities associated with predator species, and that's why they feel scary? (Is this a predator-species-world?)
  5. Most humans are also predators in a non-strict sense.
  6. I don't want to live in a world where there's only the final survivors of selection processes who shrug indifferently when asked why we don't revive all the beings who were killed in the process which created the final survivors. (implicit: related to how a 'predator-species-world' from (4) could exist)
  7. There's been many occasions where I've noticed what feels like a more general version of that attitude in a type of current human, but I don't know how to describe it.

(I mostly ignored the humans-are-in-a-box part.)

Replies from: Jay
comment by Jay · 2024-07-25T10:54:21.362Z · LW(p) · GW(p)

I don't want to live in a world where there's only the final survivors of selection processes who shrug indifferently when asked why we don't revive all the beings who were killed in the process which created the final survivors.

If you could revive all the victims of the selection process that brought us to the current state, all the crusaders and monarchists and vikings and Maoists and so, so many illiterate peasant farmers (on much too little land because you've got hundreds of generations of them at once, mostly with ideas that make Putin look like Sonia Sotomayor), would you?  They'd probably make quite the mess.  Bringing them back would probably restart the selection process and we probably wouldn't be selected again.  It just seems like a terrible idea to me.

Replies from: quila
comment by quila · 2024-07-25T12:01:06.414Z · LW(p) · GW(p)

Some clarifications:

  • I'm thinking of this in the context of a post-singularity future, where we wouldn't need to worry about things like conflict or selection processes.
  • By 'the ones who were killed in the process', I was thinking about e.g herbivorous animals that were killed by predator species[1], but you're correct that it could include humans too. A lot of humans have been unjustly killed (by others or by nature) throughout history.
  • I think my endorsed morals are indifferent about the (dis)value of reviving abusive minds from the past, though moral-patient-me dislikes the idea on an intuitive level, and wishes for a better narrative ending than that.

(Also I upvoted your comment from negative)

I also notice some implied hard moral questions (What of current mean-hearted people? What about the potential for past ones of them to have changed into good people? etc)

  1. ^

    As a clear example of a kind of being who seems innocent of wrongdoing. Not ruling out other cases, e.g plausibly inside the mind of the cat that I once witnessed killing a bunny, there could be total naivety about what was even being done.

    Sort-of relatedly, I basically view evolution as having favored the dominance of agents with defect-y decision-making, even though the equilibrium of 'collaborating with each other to harness the free energy of the sun' would have been so much better. (Maybe another reason that didn't happen is that there would be less of a gradual buildup of harder and harder training environments, in that case)

Replies from: matthew-barnett
comment by Matthew Barnett (matthew-barnett) · 2024-07-27T01:17:19.330Z · LW(p) · GW(p)

I'm thinking of this in the context of a post-singularity future, where we wouldn't need to worry about things like conflict or selection processes.

I'm curious why you seem to think we don't need to worry about things like conflict or selection processes post-singularity.

Replies from: quila
comment by quila · 2024-07-27T07:15:53.311Z · LW(p) · GW(p)

Because a benevolent ASI would make everything okay.

(In case worrying about those is something you'd find fun [? · GW], then you could choose to experience contexts where you still would, like complex game/fantasy worlds.)

Replies from: carado-1
comment by Tamsin Leake (carado-1) · 2024-07-27T11:09:36.451Z · LW(p) · GW(p)

To be more precise: extrapolated over time, for any undesired selection process or other problem of that kind, either the problem is large enough that it gets exarcerbated over time so much that it eats everything — and then that's just extinction, but slower — or it's not large enough to win out and aligned superintelligence(s) + coordinated human action is enough to stamp it out in the long run, which means they won't be an issue for almost all of the future.

It seems like for a problem to be just large enough that coordination doesn't stamp it away, but also it doesn't eat everything, would be a very fragile equilibrium, and I think that's pretty unlikely.

comment by quila · 2024-07-21T03:01:04.144Z · LW(p) · GW(p)

random idea for a voting system (i'm a few centuries late. this is just for fun.)

instead of voting directly, everyone is assigned to a discussion group of x (say 5) of themself and others near them. the group meets to discuss at an official location (attendance is optional). only if those who showed up reach consensus does the group cast one vote.

many of these groups would not reach consensus, say 70-90%. that's fine. the point is that most of the ones which do would be composed of people who make and/or are receptive to valid arguments. this would then shift the memetic focus of politics towards rational arguments instead of being mostly rhetoric/bias reinforcement (which is unlikely to produce consensus when repeated in this setting).

possible downside: another possible equilibrium is memetics teaching people how to pressure others into agreeing during the group discussion, when e.g it's 3 against 2 or 4 against 1. possible remedy: have each discussion group be composed of a proportional amount of each party's supporters. or maybe have them be 1-on-1 discussions instead of groups of x>2 because those tend to go better anyways.

also, this would let misrepresented minority positions be heard correctly.

i don't think this would have saved humanity from ending up in an inadequate equilibrium [? · GW], but maybe would have at least been less bad.

Replies from: andrei-alexandru-parfeni
comment by sunwillrise (andrei-alexandru-parfeni) · 2024-07-21T04:12:13.142Z · LW(p) · GW(p)

the point is that most of the ones which do would be composed of people who make and/or are receptive to valid arguments

I strongly disagree with this, as a descriptive matter of how the vast majority of groups of regular (neurotypical) people function. 

I would expect that the groups which reach consensus would generally do so because whichever of the 5 individuals has greatest combination of charisma, social skills, and assertiveness in dialogue would domineer the discussion and steer it in a direction where whoever else might disagree gets conversationally out-skilled to the point where social pressure from everyone else gets them to give up and drop their objections (likely by actually subjectively feeling that they get convinced by the arguments of the charismatic person, when in reality it's just social proof doing the work).

I think the fact that you don't expect this to happen is more due to you improperly generalizing [LW · GW] from the community of LW-attracted people (including yourself), whose average psychological make-up appears to me to be importantly different from that of the broader public.

Replies from: quila
comment by quila · 2024-07-21T04:16:45.640Z · LW(p) · GW(p)

Please don't make unfounded speculation[1] about my psychology. I feel pressured to respond just to say that's not true (that I am not generalizing from lesswrong users).

the groups which reach consensus would generally do so because whichever of the 5 individuals has greatest combination of charisma, social skills, and assertiveness in dialogue would domineer the discussion

(That was a possible failure mode mentioned, I don't know why you're reiterating it with just more detail). My impression was that many neurotypicals are used (/desensitized) to that happening by now and that there might frequently be attempts from multiple which would not be resolved.

But this was not a strongly held belief, nor a topic that seems important at this phase of history; it was just a fun-idea-shortform. I feel discouraged by what I perceive to be the assertiveness/assumingness of your comment.

  1. ^

    (edit: I agree correctly-hedged speculation is okay and would have been okay here, I meant something like confidently-expressed claims about another user's mind with low evidence.)

Replies from: andrei-alexandru-parfeni, andrei-alexandru-parfeni
comment by sunwillrise (andrei-alexandru-parfeni) · 2024-07-21T04:30:47.559Z · LW(p) · GW(p)

I disagree that the speculation was unfounded. I checked your profile [LW · GW] before making that comment (presumably written by you, and thus a very well-founded source) and saw "~ autistic." I would not have made that statement, as written, if this had not been the case (for instance the part of "including yourself").

Then, given my past experience with similar proposals that were written about on LW, in which other users correctly pointed out the problems with the proposal and it was revealed that the OP was implicitly making assumptions that the broader community was akin to that of LW, it was reasonable to infer that the same was happening here. (It still seems reasonable to infer this, regardless of your comment, but that is beside the point.) In any case, I said "think" which signaled that I understood my speculation was not necessarily correct. 

I have written up my thoughts [LW(p) · GW(p)] before on why good moderation practices should not allow for the mind-reading of others, but I strongly oppose any norm that says the mere speculation, explicitly labeled as such through language that signals some epistemic humility, is inherently bad. I even more strongly oppose a norm that other users feeling pressured to respond should have a meaningful impact on whether a comment is proper or not.

I expect your comment to not have been a claim about the norms of LW, but rather a personal request. If so, I do not expect to comply (unless required to by moderation).

Replies from: quila
comment by quila · 2024-07-21T04:55:17.372Z · LW(p) · GW(p)

I don't agree that my bio stating I'm autistic[1] is strong/relevant* evidence that I assume the rest of the world is like me or LessWrong users, I'm very aware that this is not the case. I feel a lot of uncertainty about what happens inside the minds of neurotypical people (and most others), but I know they're very different in various specific ways, and I don't think the assumption you inferred is one I make; it was directly implied in my shortform that neurotypicals engage in politics in a really irrational way, are influentiable by such social pressures as you (and I) mentioned, etc.

*Technically, being a LessWrong user is some bayesian evidence that one makes that assumption, if that's all you know about them, so I added the hedge "strong/relevant", i.e. enough to reasonably cause one to write "I think you are making [clearly-wrong assumption x]" instead of using more uncertain phrasings.

I even more strongly oppose a norm that other users feeling pressured to respond should have a meaningful impact on whether a comment is proper or not.

I agree that there are cases where feeling pressured to respond is acceptable. E.g., if someone writes a counterargument which one think misunderstands their position, they might feel some internal pressure to respond to correct this; I think that's okay, or at least unavoidable.

I don't know how to define a general rule for determining when making-someone-feel-pressured is okay or not, but this seemed like a case where it was not okay: in my view, it was caused by an unfounded confident expression of belief about my mind.

If you internally believe you had enough evidence to infer what you wrote at the level of confidence to just be prefaced with 'I think', perhaps it should not be against LW norms, though; I don't have strong opinions on what site norms should be, or how norms should differ when the subject is the internal mind of another user.

More on norms: the assertive writing style of your two comments here seems also possibly norm-violating as well.

Edit: I'm flagging this for moderator review.

  1. ^

    the "~ " you quoted is just a separator from the previous words, in case you thought it meant something else

Replies from: habryka4
comment by habryka (habryka4) · 2024-07-21T22:28:00.942Z · LW(p) · GW(p)

As a moderator: I do think sunwillrise was being a bit obnoxious here. I think the norms they used here were fine for frontpage LW posts, but shortform is trying to do something that is more casual and more welcoming of early-stage ideas, and this kind of psychologizing I think has reasonably strong chilling-effects on people feeling comfortable with that. 

I don't think it's a huge deal, my best guess is I would just ask sunwillrise to comment less on quila's stuff in-particular, and if it becomes a recurring theme, to maybe more generally try to change how they comment on shortforms.

I do think the issue here is kind of subtle. I definitely notice an immune reaction to sunwillrise's original comment, but I can't fully put into words why I have that reaction, and I would also have that reaction if it was made as a comment on a frontpage post (but I would just be more tolerant of it). 

I think the fact that you don't expect this to happen is more due to you improperly generalizing [LW · GW] from the community of LW-attracted people (including yourself), whose average psychological make-up appears to me to be importantly different from that of the broader public.

Like, I think my key issue here is that sunwillrise just started a whole new topic that quila had expressed no interest in talking about, which is the topic of "what are my biases on this topic, and if I am wrong, what would be the reason I am wrong?", which like, IDK, is a fine topic, but it is just a very different topic that doesn't really have anything to do with the object level. Like, whether quila is biased on this topic does not make a difference to question of whether this policy-esque proposal would be a good idea, and I think quila (and most other readers) are usually more interested in discussing that then meta-level bias stuff.

There is also a separate thing, where making this argument in some sense assumes that you are right, which I think is a fine thing to do, but does often make good discussion harder. Like, I think for comments, its usually best to focus on the disagreement, and not to invoke random other inferences about the world about what is true if you are right. There can be a place for that, especially if it helps illucidate your underlying world model, but I think in this case little of that happened.

comment by sunwillrise (andrei-alexandru-parfeni) · 2024-07-21T05:17:23.359Z · LW(p) · GW(p)

(That was a possible failure mode mentioned, I don't know why you're reiterating it with just more detail)

Separately from the more meta discussion about norms, I believe the failure mode I mentioned is quite different from yours in an important respect that is revealed by the potential remedy you pointed out [LW(p) · GW(p)] ("have each discussion group be composed of a proportional amount of each party's supporters. or maybe have them be 1-on-1 discussions instead of groups of x>2 because those tend to go better anyways").

Together with your explanation of the failure mode ("when e.g it's 3 against 2 or 4 against 1"), it seems to me like you are thinking of a situation where one Republican, for instance, is in a group with 4 Democrats, and thus feels pressure from all sides in a group discussion because everyone there has strong priors that disagree with his/hers. Or, as another example, when a person arguing for a minority position is faced with 4 others who might be aggresively conventional-minded and instantly disapprove of any deviation from the Overton window. (I could very easily be misinterpreting what you are saying, though, so I am less than 95% confident of your meaning.)

In this spot, the remedy makes a lot of sense: prevent these gang-up-on-the-lonely-dissenter [LW · GW] spots by making the ideological mix-up of the group more uniform or by encouraging 1-on-1 conversations in which each ideology or system of beliefs will only have one representative arguing for it.

But I am talking about a failure mode that focuses on the power of one single individual to swing the room towards him/her, regardless of how many are initially on his/her side from a coalitional perspective. Not because those who disagree are initially in the minority and thus cowed into staying silent (and fuming, or in any case not being internally convinced [LW · GW]), but rather because the "combination of charisma, social skills, and assertiveness in dialogue" would take control of the conversation and turn the entire room in its favor, likely by getting the others to genuinely believe that they are being persuaded for rational reasons instead of social proof.

This seems importantly different from your potential downside, as can be seen by the fact that the proposed remedy would not be of much use here; the Dark Arts [LW · GW] conversational superpowers would be approximately as effective in 1-on-1 discussions as in group chats (perhaps even more so in some spots, since there would be nobody else in the room to potentially call out the missing logic or misleading rhetoric etc) and would still remain impactful even if the room was ideologically mixed-up to start.

To clarify, I do not expect the majority of such conversations to actually result in a clever arguer [LW · GW] that's good at conversations to be able to convince those who disagree to bend over to his/her position (the world is not lacking for charismatic and ambitious people, so I would expect everything around us to look quite different if convincing others to change their political leanings was simple). But, conditional on the group having reached consensus, I do predict, with high probability, that it did so because of these types of social dynamics rather than because they are composed of people that react well to "valid arguments" that challenge closely-held political beliefs.

(edit: wrote this before I saw the edit in your most recent comment. Feel free to ignore all of this until the matter gets resolved)

Replies from: quila
comment by quila · 2024-07-21T06:05:50.376Z · LW(p) · GW(p)

I think this is a good object-level comment.

Meta-level response about "did you mean this or rule it out/not have a world model where it happens?":

Some senses in which you're right that it's not what I was meaning:

  • It's more specific/detailed. I was not thinking in this level of detail about how such discussions would play out.
  • I was thinking more about pressure than about charisma (where someone genuinely seems convincing). And yes, charisma could be even more powerful in a 1-on-1 setting.

Senses in which it is what I meant:

  • This is not something my world model rules out, it just wasn't zoomed in on it, possibly because I'm used to sometimes experiencing a lot of pressure from neurotypical people over my beliefs. (that could have biased my internal frame to overfocus on pressure).
  • For the parts about more even distributions being better, it's more about: yes, these dynamics exist, but I thought they'd be even worse when combined with a background conformity pressure, e.g when there's one dominant-pressuring person and everyone but you passively agreeing with what they're saying, and tolerating it because they agree.

Object-level response:

conditional on the group having reached consensus, I do predict, with high probability, that it did so because of these types of social dynamics rather than because they are composed of people that react well to "valid arguments" that challenge closely-held political beliefs.

(First, to be clear: the beliefs don't have to be closely-held; we'd see consensuses more often when for {all but at most one side} they're not)

That seems plausible. We could put it into a (handwavey) calculation form, where P(1 dark arts arguer) is higher than P(5 truth-seekers). But it's actually a lot more complex; e.g., what about P(all opposing participants susceptible to such an arguer), or how e.g one more-truth-seeking attitude can influence others to have a similar attitude for that context. (and this is without me having good priors on the frequencies and degrees of these qualities, so I'm mostly uncertain).

A world with such a proposal implemented might even then see training programs for clever dark arts arguing. (Kind of like I mentioned at the start, but again with me using the case of pressuring specifically: "memetics teaching people how to pressure others into agreeing during the group discussion")

comment by quila · 2024-06-16T12:55:22.328Z · LW(p) · GW(p)

i am kind of worried by the possibility that this is not true: there is an 'ideal procedure for figuring out what is true [LW(p) · GW(p)]'.

for that to be not true, it would mean that: for any (or some portion of?) task(s), the only way to solve it is through something like a learning/training process (in the AI sense), or other search-process-involving-checking. it would mean that there's no 'reason' behind the solution being what it is, it's just a {mathematical/logical/algorithmic/other isomorphism} coincidence.

for it to be true, i guess it would mean that there's another procedure ({function/program}) that can deduce the solution in a more 'principled'[1] way (could be more or less efficient)

more practically, it being not true would be troubling for strategies based on 'create the ideal intelligence-procedure and use it as an oracle [or place it in a formal-value-containing hardcoded-structure that uses it like an oracle]'

why do i think it's possible for it to be not true? because we currently observe training processes succeeding, but don't yet know of an ideal procedure[2]. that's all. a mere possibility, not a 'positive argument'.

  1. ^

    i don't know exactly what i mean by this

  2. ^

    in case anyone thinks 'bayes theorem / solomonoff induction!' - bayes theorem isn't it, because, for example, it doesn't alone tell you how to solve a maze. i can try to elaborate if needed

Replies from: Mitchell_Porter
comment by Mitchell_Porter · 2024-06-17T09:43:26.609Z · LW(p) · GW(p)

I think there's no need to think of "training/learning" algorithms as absolutely distinct from "principled" algorithms. It's just that the understanding of why deep learning works is a little weak, so we don't know how to view it in a principled way. 

Replies from: quila, quila
comment by quila · 2024-06-17T10:51:29.074Z · LW(p) · GW(p)

the understanding of why deep learning works is a little weak, so we don't know how to view it in a principled way.

It sounds like you're saying, "deep learning itself is actually approximating some more ideal process." (I have no comments on that, but I find it interesting to think about what that process would be, and what its safety-relevant properties would be)

comment by quila · 2024-06-17T10:44:23.118Z · LW(p) · GW(p)
comment by quila · 2024-01-28T02:07:33.730Z · LW(p) · GW(p)

Mutual Anthropic Capture, A Decision-theoretic Fermi paradox solution

(copied from discord, written for someone not fully familiar with rat jargon)
(don't read if you wish to avoid acausal theory)

simplified setup

  • there are two values. one wants to fill the universe with A, and the other with B.
  • for each of them, filling it halfway is really good, and filling it all the way is just a little bit better. in other words, they are non-linear utility functions.
  • whichever one comes into existence first can take control of the universe, and fill it with 100% of what they want.
  • but in theory they'd want to collaborate to guarantee the 'really good' (50%) outcome, instead of having a one-in-two chance at the 'a little better than really good' (100%) outcome.
  • they want a way to collaborate, but they can't because one of them will exist before the other one, and then lack an incentive to help the other one. (they are both pure function maximizers)

how they end up splitting the universe, regardless of which comes first: mutual anthropic capture.

imagine you observe yourself being the first of the two to exist. you reason through all the above, and then add...

  • they could be simulating me, in which case i'm not really the first.
  • were that true, they could also expect i might be simulating them
  • if i don't simulate them, then they will know that's not how i would act if i were first, and be absolved of their worry, and fill the universe with their own stuff.
  • therefor, it's in my interest to simulate them

both simulate each other observing themselves being the first to exist in order to unilaterally prevent the true first one from knowing they are truly first.

from this point they can both observe each others actions. specifically, they observe each other implementing the same decision policy which fills the universe with half A and half B iff this decision policy is mutually implemented, and which shuts the simulation down if it's not implemented.

conclusion

in reality there are many possible first entities which take control, not just two, so all of those with non-linear utility functions get simulated.

so, odds are we're being computed by the 'true first' life form in this universe, and that that first life form is in an epistemic state no different from that described here.

Replies from: ward-anomalous
comment by Anomalous (ward-anomalous) · 2024-02-02T10:01:32.973Z · LW(p) · GW(p)

This is an awesome idea, thanks! I'm not sure I buy the conclusion, but expect having learned about "mutual anthropic capture" will be usefwl for my thinking on this.

comment by quila · 2024-12-28T21:45:50.603Z · LW(p) · GW(p)

i've wished to have a research buddy who is very knowledgeable about math or theoretical computer science to answer questions or program experiments (given good specification). but:

  • idk how to find such a person
  • such a person may wish to focus on their own agenda instead
  • unless they're a friend already, idk if i have great evidence that i'd be impactful to support.

so: i could instead do the inverse with someone. i am good at having creative ideas, and i could try to have new ideas about your thing, conditional on me (1) being able to {quickly understand it} and reason about it and (2) not thinking it is doomed.

if you want me to potentially try doing this for your focuses, message me here. (constraints: you must be focused on the 'hard problems of alignment', and accept me communicating only with text)

i think that in either starting arrangement, if it worked out well, then our models would eventually overlap in the research-direction-relevant parts and we'd form a kind of superorganism that uses both of our abilities. but it going that well may be rare (i don't actually know!). the cost/benefit looks good to me.

comment by quila · 2024-08-29T23:08:50.448Z · LW(p) · GW(p)

avoiding akrasia by thinking of the world in terms of magic: the gathering effects

example initial thought process: "i should open my laptop just to write down this one idea and then close it and not become distracted".

laptop rules text: "when activated, has an 80% chance of making you become distracted"

new reasoning: "if i open it, i need to simultaneously avoid that 80% chance somehow."

 

why this might help me: (1) i'm very used to strategizing about how to use a kit of this kind of effect, from playing such games. (2) maybe normal reasoning about 'what to do' happens in a frame where i have full control over what i focus on, versus this includes it being dependent on my environment

potential downside: same as (2), it conceptualizes away some agency. i.e i could theoretically 'just choose not to enter negative[1] focus-attraction-basins' 100% of the time. but i don't know how to do that 100% of the time, so it works at least as a reflection of the current equilibrium.

  1. ^

    some focus-attraction-basins are positive, e.g for me these include making art and deep thinking, these are the ones i want to strategically use effects to enter

Replies from: Raemon
comment by Raemon · 2024-08-30T00:46:24.290Z · LW(p) · GW(p)

I have some dream of being able to generalize skills from games. People who are good at board games clearly aren't automatically hypercompetent all-around, but I think/hope this is because they aren't making a deliberate effort to generalize.

So, good luck, and let us know how this goes. :)

comment by quila · 2024-04-26T00:35:04.580Z · LW(p) · GW(p)

i'm watching Dominion again to remind myself of the world i live in, to regain passion to Make It Stop

it's already working.

Replies from: quila
comment by quila · 2024-04-26T00:59:16.301Z · LW(p) · GW(p)

when i was younger, pre-rationalist, i tried to go on hunger strike to push my abusive parent to stop funding this.

they agreed to watch this as part of a negotiation. they watched part of it.

they changed their behavior slightly -- as a negotiation -- for about a month.

they didn't care.

they looked horror in the eye. they didn't flinch. they saw themself in it.

comment by quila · 2024-01-17T18:18:27.956Z · LW(p) · GW(p)

negative values collaborate.

for negative values, as in values about what should not exist, matter can be both "not suffering" and "not a staple", and "not [any number of other things]".

negative values can collaborate with positive ones, although much less efficiently: the positive just need to make the slight trade of being "not ..." to gain matter from the negatives.

comment by quila · 2024-11-08T05:49:10.973Z · LW(p) · GW(p)

What is malevolence? On the nature, measurement, and distribution of dark traits [LW · GW] was posted two weeks ago (and i recommend it). there was a questionnaire discussed in that post which tries to measure the levels of 'dark traits' in the respondent.

i'm curious about the results[1] of rationalists[2] on that questionnaire, if anyone wants to volunteer theirs. there are short and long versions (16 and 70 questions).

  1. ^

    (or responses to the questions themselves)

  2. ^

    i also posted the same shortform to the EA forum [EA(p) · GW(p)], asking about EAs

Replies from: Viliam
comment by Viliam · 2024-11-11T15:30:34.601Z · LW(p) · GW(p)

Thank you for the article!

The long version: https://qst.darkfactor.org/?site=pFBYndBUExaK041MEY5TmJCa3RiaWNsKzhiT2V3Y01iL0t5cC80RVE3dEdMNjZHczNocU1BaHA1czZIT1dyd2pzSg

comment by quila · 2024-10-16T07:01:53.602Z · LW(p) · GW(p)

one of my basic background assumptions about agency:

there is no ontologically fundamental caring/goal-directedness, there is only the structure of an action being chosen (by some process, for example a search process), then taken.

this makes me conceptualize the 'ideal agent structure' as being "search, plus a few extra parts". in my model of it, optimal search is queried for what action fulfills some criteria ('maximizes some goal') given some pointer (~ world model) to a mathematical universe sufficiently similar to the actual universe → search's output is taken as action, and because of said similarity we see a behavioral agent that looks to us like it values the world it's in.

i've been told that {it's common to believe that search and goal-directedness are fundamentally intertwined or meshed together or something}, whereas i view goal-directedness as almost not even a real thing, just something we observe behaviorally when search is scaffolded in that way.

if anyone wants to explain the mentioned view to me, or link a text about it, i'd be interested.

(maybe a difference is in the kind of system being imagined: in selected-for systems, i can understand expecting things to be approximately-done at once (i.e. within the same or overlapping strands of computations); i guess i'd especially expect that if there's a selection incentive for efficiency. i'm imagining neat, ideal (think intentionally designed rather than selected for) systems in this context.)

edit: another implication of this view is that decision theory is its own component (could be complex or not) of said 'ideal agent structure', i.e. that superintelligence with an ineffective decision theory is possible (edit: nontrivially likely for a hypothetical AI designer to unintentionally program / need to avoid). that is, one being asked the wrong questions (i.e. of the wrong decision theory) in the above model.

comment by quila · 2024-08-09T05:07:27.109Z · LW(p) · GW(p)

I recall a shortform here speculated that a good air quality hack could be a small fan aimed at one's face to blow away the Co2 one breathes out. I've been doing this and experience it as helpful, though it's hard know for sure.

This also includes having it pointed above my face during sleep, based on experience after waking. (I tended to be really fatigued right after waking. Keeping water near bed to drink immediately also helped with that.)

comment by quila · 2024-08-06T02:55:59.487Z · LW(p) · GW(p)

I notice that my strong-votes now give/take 4 points. I'm not sure if this is a good system.

Replies from: quila
comment by quila · 2024-08-06T03:06:10.161Z · LW(p) · GW(p)

@habryka [LW · GW] feature request: an option to make the vote display count every normal vote as (plus/minus) 1, and every strong vote as 2 (or also 1)

Also, sometimes if I notice an agree/disagree vote at +/-9 from just 1 vote, I don't vote so it's still clear to other users that it was just one person. This probably isn't the ideal equilibrium.

comment by quila · 2024-05-12T10:26:25.119Z · LW(p) · GW(p)

At what point should I post content as top-level posts rather than shortforms?

For example, a recent writing I posted to shortform was ~250 concise words plus an image. It would be a top-level post on my blog if I had one set up (maybe soon :p).

Some general guidelines on this would be helpful.

Replies from: niplav
comment by niplav · 2024-05-12T16:40:29.837Z · LW(p) · GW(p)

This is a good question, especially since there've been some [LW(p) · GW(p)] short [LW(p) · GW(p)] form [LW(p) · GW(p)] posts [LW(p) · GW(p)] recently [LW(p) · GW(p)] that are high quality and would've made good top-level posts—after all, posts can be short [EA · GW].

Replies from: Emrik North
comment by Emrik (Emrik North) · 2024-05-17T02:52:46.105Z · LW(p) · GW(p)

Epic Lizka post is epic.

Also, I absolutely love the word "shard" but my brain refuses to use it because then it feels like we won't get credit for discovering these notions by ourselves. Well, also just because the words "domain", "context", "scope", "niche", "trigger", "preimage" (wrt to a neural function/policy / "neureme") adequately serve the same purpose and are currently more semantically/semiotically granular in my head.

trigger/preimage ⊆ scope ⊆ domain[1]

"niche" is a category in function space (including domain, operation, and codomain), "domain" is a set.

"scope" is great because of programming connotations and can be used as a verb. "This neural function is scoped to these contexts."

  1. ^

    EDIT: ig I use "scope" and "domain" in a way which doesn't neatly mean one is a subset of the other. I want to be able to distinguish between "the set of inputs it's currently applied to" and "the set of inputs it should be applied to" and "the set of inputs it could be applied to", but I don't have adequate words here.

comment by quila · 2024-09-29T03:26:53.342Z · LW(p) · GW(p)

i'm finally learning to prove theorems (the earliest ones following from the Peano axioms) in lean, starting with the natural number game. it is actually somewhat fun, the same kind of fun that mtg has by being not too big to fully comprehend, but still engaging to solve.

(if you want to 'play' it as well, i suggest first reading a bit about what formal systems are and interpretation before starting. also, it was not clear to me at first when the game was introducing axioms vs derived theorems, so i wondered how some operations (e.g 'induction') were allowed, but it turned out that and some others are just in the list of Peano axioms.)

also, this reminded me of one of @Raemon [LW · GW]'s idea (https://www.lesswrong.com/posts/PiPH4gkcMuvLALymK/exercise-solve-thinking-physics), 'how to prove theorem' feels like a pure case of 'solving a problem that you (often) do not know how to solve', which iiuc they're a proponent of training on

comment by quila · 2024-09-03T12:59:50.637Z · LW(p) · GW(p)

in the space of binary-sequences of all lengths, i have an intuition that {the rate at which there are new 'noticed patterns' found at longer lengths} decelerates as the length increases.

  • what do i mean by "noticed patterns"? 

    in some sense of 'pattern', each full sequence is itself a 'unique pattern'. i'm using this phrase to avoid that sense.

    rather, my intuition is that {what could in principle be noticed about sequences of higher lengths} exponentially tends to be things that had already been noticed of sequences of lower lengths. 'meta patterns' and maybe 'classes' are other possible terms for these. two simple examples are "these ones are all random-looking sequences" and "these can be compressed in a basic way"[1].

    note: not implying there are few such "meta-patterns that can be noticed about a sequence", or that most would be so simple/human-comprehensible.

 

in my intuition this generalizes to functions/programs in general. as an example: in the space of all definable 'mathematical universes', 'contains agentic processes' is such a meta-pattern which would continue to recur (=/= always or usually present) at higher description lengths.

('mathematical universe' does not feel like a distinctly-bounded category to me. i really mean 'very-big/complex programs', and 'universe' can be replaced with 'program'. i just use this phrasing to try to help make this understandable, because i expect the claim that 'contains agents' is such a recurring higher-level pattern to be intuitive.)

and as you consider universes/programs whose descriptions are increasingly complex, eventually ~nothing novel could be noticed. e.g., you keep seeing worlds where agentic processes are dominant, or where some simple unintelligent process cascades into a stable end equilibrium, or where there's no potential for those, etc <same note from earlier applies>. (more-studied things like computational complexity may also be examples of such meta-patterns)

a stronger claim which might follow (about the space of possible programs) is that eventually (at very high lengths), even as length/complexity increases exponentially, the resulting universes/programs higher-level behavior[2] still ends up nearly-isomorphic to that of relatively-much-earlier/simpler universes/programs. (incidentally, this could be used to justify a simplicity prior/heuristic)

 

in conclusion, if this intuition is true, the space of all functions/programs is 'already' or naturally a space of constrained diversity. in other words, if true, the space of meta-patterns[3] is finite (i.e approaches some specific integer), even though the space of functions/programs is infinite.

  1. ^

    (e.g., 100 1s followed by 100 0s is simple to compress)

  2. ^

    though this makes me wonder about the possibility of 'anti-pattern' programs i.e ones selected/designed to not be nearly-isomorphic to anything previous. maybe they'd become increasingly sparse or something?

  3. ^

    for some given formal definition that matches what the 'meta/noticed pattern' concept is trying to be about, which i don't know how to define. this concept also does not feel distinctly-bounded to me, so i guess there's multiple corresponding definitions

Replies from: johannes-c-mayer
comment by Johannes C. Mayer (johannes-c-mayer) · 2024-09-03T14:17:51.449Z · LW(p) · GW(p)

Consider all the programs that encode uncomputable numbers up to digits. There are infinitely many of these programs. Now consider the set of programs . Each program in P' has some pattern. But it's always a different one.

comment by quila · 2024-07-17T05:15:41.989Z · LW(p) · GW(p)

'how to have ideas' (summary of a nonexistent post)

comment by quila · 2024-07-13T02:27:50.814Z · LW(p) · GW(p)

i tentatively think an automatically-herbivorous and mostly-asocial species/space-of-minds would have been morally best to be the one which first reached the capability threshold to start building technology and civilization.

  • herbivorous -> less ingrained biases against other species[1], no factory farming
  • asocial -> less dysfunctional dynamics within the species (like automatic status-seeking, rhetoric, etc), and less 'psychologically automatic processes' which fail to generalize out of the evolutionary distribution.[2]
  1. ^

    i expect there still would be some such biases, because it's not as if other species were irrelevant or purely-cooperative in the evolutionary environments of more-asocial minds. but i'd expect them to be more like "x species is scary, its members tend to prey on beings such as myself," rather than the other-eater intuition "x species is morally unvaluable, it's okay to hurt or kill them"

  2. ^

    (example of 'failed generalization': humans treat their observations as direct evidence about the world, but never needed to distinguish between observations they make directly and observations of something being shown to them. this is a problem when humans can be selectively shown observations in a non-statistically-representative way. tho, this one might effect asocial animals too, since 'treating observations as evidence' is generally useful.

    the main failed generalization that i would actually want to point to in humans, but that's harder to write about, is how they default to believing things each other say, normally don't have probabilistic models of beliefs (because they're acting as a part of a larger cultural belief-selection process), and in general default to heuristic behavior that is more fitting for a kind of 'swarm intelligence/species')

Replies from: metachirality
comment by metachirality · 2024-07-13T04:10:31.468Z · LW(p) · GW(p)

I think asociality might prevent the development of altruistic ethics.

Also it's hard to see how an asocial species would develop civilization.

Replies from: quila
comment by quila · 2024-07-13T04:11:45.199Z · LW(p) · GW(p)

I think asociality might prevent the development of altruistic ethics.

same, but not sure, i was in the process of adding a comment about that

Also it's hard to see how an asocial species would develop civilization.

they figure out planting and then rationally collaborate with each other?

these might depend on 'degree of (a)sociality'. hard for me to imagine a fully asocial species though they might exist and i'd be interested to see examples.

chatgpt says..

Replies from: metachirality
comment by metachirality · 2024-07-13T04:14:34.141Z · LW(p) · GW(p)

they figure out planting and then rationally collaborate with each other?

I feel like they would end up converging on the same problems that plague human sociality.

comment by quila · 2024-07-12T22:45:03.817Z · LW(p) · GW(p)

what should i do with strong claims whose reasons are not easy to articulate, or the culmination of a lot of smaller subjective impressions? should i just not say them publicly, to not conjunctively-cause needless drama? here's an example:

"i perceive the average LW commenter as maybe having read the sequences long ago, but if so having mostly forgotten their lessons."

Replies from: faul_sname
comment by faul_sname · 2024-07-13T05:24:15.358Z · LW(p) · GW(p)

In the general case I don't have any particularly valuable guidance but on the object level for your particular hypothesis I'd say

Ask Screwtape to add a question to next year's Unofficial LessWrong Census/Survey asking which year(s) the respondent read a substantial number of core sequence posts.

comment by quila · 2024-05-20T14:09:50.536Z · LW(p) · GW(p)

Platonism

(status: uninterpretable for 2/4 reviewers, the understanding two being friends who are used to my writing style; i'll aim to write something that makes this concept simple to read)

'Platonic' is a categorization I use internally, and my agenda is currently the search for methods to ensure AI/ASI will have this property.

With this word, I mean this category acceptance/rejection:
✅ Has no goals

✅ Has goals about what to do in isolation. Example: "in isolation from any world, (try to) output A"[1]

❌ Has goals related to physical world states. Example: "(try to) ensure A gets stored in memory on the computer in the physical world that's computing my output."[2]

A can be 'the true answer to the input question', 'a proof of x conjecture', 'the most common next symbol in x world prior to my existence in it', etc.

As written here, this is a class of outer alignment [? · GW] solution. I need to write about why I believe it's a more reachable target for 'inner alignment [? · GW]'/'training stories [LW · GW]', too.

  1. ^

    A more human-intuitive transcription may include wording like: "try to be the kind of program/function which would (in isolation from any particular worldstate/physics) output A."

    I'm leaving this as a footnote because it can also confuse people, leading to questions like "What does it mean to 'try to be a kind of program' when it's already determined what kind of program it is?"

  2. ^

    This class of unaligned 'physical goals' is dangerous because if the system can't determine A, its best method to fulfill the goal is through instrumental convergence.

comment by quila · 2024-12-06T13:35:48.610Z · LW(p) · GW(p)

a possible research direction which i don't know if anyone has explored: what would a training setup which provably creates a (probably[1]) aligned system look like?

my current intuition, which is not good evidence here beyond elevating the idea from noise, is that such a training setup might somehow leverage how the training data and {subsequent-agent's perceptions/evidence stream} are sampled from the same world, albeit with different sampling procedures. for example, the training program could intake both a dataset and an outer-alignment-goal-function, and select for prediction of the dataset (to build up ability) while also doing something else to the AI-in-training; i have no idea what that something else would look like (and it seems like most of this problem).

has this been thought about before? is this feasible? why or why not?

(i can clarify if any part of this is not clear.)

(background motivator: in case there is no finite-length general purpose search algorithm[2], alignment may have to be of trained systems / learners)

  1. ^

    (because in principle, it's possible to get unlucky with sampling for the dataset. compare: it's possible for an unlucky sequence of evidence to cause an agent to take actions which are counter to its goal.)

  2. ^

    by which i mean a program capable of finding something which meets any given criteria met by at least one thing (or writing 'undecidable' in self-referential edge cases)

Replies from: harfe
comment by harfe · 2024-12-06T13:52:34.431Z · LW(p) · GW(p)

For a provably aligned (or probably aligned) system you need a formal specification of alignment. Do you have something in mind for that? This could be a major difficulty. But maybe you only want to "prove" inner alignment and assume that you already have an outer-alignment-goal-function, in which case defining alignment is probably easier.

Replies from: quila
comment by quila · 2024-12-06T14:01:55.688Z · LW(p) · GW(p)

But maybe you only want to "prove" inner alignment and assume that you already have an outer-alignment-goal-function

correct, i'm imagining these being solved separately

comment by quila · 2024-12-02T12:53:29.020Z · LW(p) · GW(p)

a moral intuition i have: to avoid culturally/conformistly-motivated cognition, it's useful to ask:

if we were starting over, new to the world but with all the technology we have now, would we recreate this practice?

example: we start and out and there's us, and these innocent fluffy creatures that can't talk to us, but they can be our friends. we're just learning about them for the first time. would we, at some point, spontaneously choose to kill them and eat their bodies, despite us having plant-based foods, supplements, vegan-assuming nutrition guides, etc? to me, the answer seems obviously not. the idea would not even cross our minds.

(i encourage picking other topics and seeing how this applies)

comment by quila · 2024-11-07T00:30:06.128Z · LW(p) · GW(p)

something i'd be interested in reading: writings about the authors alignment ontologies over time, i.e. from when they first heard of AI till now

comment by quila · 2024-07-08T02:30:13.761Z · LW(p) · GW(p)

i saw a shortform from 4 years ago [LW(p) · GW(p)] that said in passing:

if we assume that signaling is a huge force in human thinking

is signalling a huge force in human thinking?
if so, anyone want to give examples of ways of this that i, being autistic, may not have noticed?

Replies from: florian-habermacher, eye96458, quila
comment by FlorianH (florian-habermacher) · 2024-07-08T17:10:00.561Z · LW(p) · GW(p)

Difficult to overstate the role of signaling as a force in human thinking, indeed, few random examples:

  1. Expensive clothes, rings, cars, houses: Signalling 'I've got a lot of spare resources, it's great to know me/don't mess with me/I won't rob you/I'm interesting/...'
  2. Clothes of particular type -> signals your politica/religious/... views/lifestyle
  3. Talking about interesting news/persons -> signals you can be a valid connection to have as you have links
  4. In basic material economics/markets: All sorts of ways to signal your product is good (often economists refer to e.g.: insurance, public reviewing mechanism, publicity)
  5. LW-er liking to get lots of upvotes to signal his intellect or simply for his post to be a priori not entirely unfounded
  6. Us dumbly washing or ironing clothes or buying new clothes while stained-but-non-smelly or unironed or worn clothes would be functionally just as valuable - well if a major functionality would not exactly be, to signal wealth, care, status..
  7. Me teaching & consulting in a suit because the university uses an age old signalling tool to show: we care about our clients
  8. Doc having his white suit to spread an air of professional doctor-hood to the patient he tricks into not questioning his suggestions and actions
  9. Genetically: many sexually attractive traits have some origin in signaling good quality genes: directly functional body (say strong muscles) and/or 'proving spare resources to waste on useless signals' such as most egregiously for the Peacocks/Birds of paradise <- I think humans have the latter capacity too, though I might be wrong/no version comes to mind right now
  10. Intellect etc.! There's lots of theory that much of our deeper thinking abilities were much less required for basic material survival (hunting etc.), than for social purposes: impress with our stories etc.; signal that what we want is good and not only self-serving. (ok, the latter maybe that is partly not pure 'signaling' but seems at least related).
  11. Putting solar panels, driving Tesla, vegtarian ... -> we're clean and modern and care about the climate
    1. I see this sort of signaling more and more by individuals and commercial entities, esp. in places where there is low-cost for it. The café that sells "Organic coffee" uses a few cents to buy organic coffee powder to pseudo-signal care and sustainability while it sells you the dirtiest produced chicken sandwich, saving many dollars compared to organic/honestly produced produce.
    2. Of course, shops do this sort of stuff commercially all the time -> all sorts of PR is signaling
  12. Companies do all sorts of psychological tricks to signal they're this or that to motivate its employees too
  13. Politics. For a stylized example consider: Trump or so with his wall promises signalling to those receptive to it that he'd be caring to reduce illegal immigation (while knowing he does/cannot change the situation so much so easily)
    1. Biden or so with his stopping-the-wall promises signalling a leaner treatment of illegal immigrants (while equally knowing he does/cannot change the situation so much so easily)
  14. ... list doesn't stop...  but I guess I better stop here :)

I guess the vastness of signaling importantly depends on how narrowly or broadly we define it in terms of: Whether we consciously have in mind to signal something vs. whether we instinctively do/like things that serve for us to signal quality/importance... But both signalling domains seem absolutely vast - and sometimes with actual value for society, but often zero-sum effects i.e. a waste of resources.

comment by eye96458 · 2024-07-08T02:46:50.859Z · LW(p) · GW(p)

Does the preference forming process count as thinking?  If so, then I suspect that my desire to communicate that I am deep/unique/interesting to my peers is a major force in my preference for fringe and unpopular musical artists over Beyonce/Justin Bieber/Taylor Swift/etc.  It's not the only factor, but it is a significant one AFAICT.

And I've also noticed that if I'm in a social context and I'm considering whether or not to use a narcotic (eg, alcohol), then I'm extremely concerned about what the other people around me will think about me abstaining (eg, I may want to avoid communicating that I disapprove of narcotic use or that I'm not fun).  In this case I'm just straight forwardly thinking about whether or not to take some action.

Are these examples of the sort of thing you are interested in? Or maybe I am misunderstanding what is meant by the terms "thinking" and "signalling".

comment by quila · 2024-08-12T22:22:29.533Z · LW(p) · GW(p)

found a pretty good piece of writing about this: 'the curse of identity' [LW · GW]

it also discusses signalling to oneself

comment by quila · 2024-05-20T23:54:56.867Z · LW(p) · GW(p)

random (fun-to-me/not practical) observation: probability is not (necessarily) fundamental. we can imagine totally discrete mathematical worlds where it is possible for an entity inside it to observe the entirety of that world including itself. (let's say it monopolizes the discrete world and makes everything but itself into 1s so it can be easily compressed and stored in its world model such that the compressed data of both itself and the world can fit inside of the world)

this entity would be able to 'know' (prove?) with certainty everything about that mathematical world, except it would remain uncertain whether it's actually isolated (/simulated) inside some larger world. (possibly depending on what algorithms underly cognition), it might also have to be uncertain about whether its mind is being edited from the outside.

the world we are in may be disanalogous to that one in some way that makes probability actually-fundamental here, and in any case probability is necessary because this one is complex.

comment by quila · 2024-05-12T06:02:18.146Z · LW(p) · GW(p)

my language progression on something, becoming increasingly general: goals/value function -> decision policy (not all functions need to be optimizing towards a terminal value) -> output policy (not all systems need to be agents) -> policy (in the space of all possible systems, there exist some whose architectures do not converge to output layer)

(note: this language isn't meant to imply that systems behavior must be describable with some simple function, in the limit the descriptive function and the neural network are the same)

comment by quila · 2024-01-10T21:03:52.128Z · LW(p) · GW(p)

I'm interested in joining a community or research organization of technical alignment researchers who care about and take seriously astronomical-suffering risks. I'd appreciate being pointed in the direction of such a community if one exists.

comment by quila · 2024-11-29T00:46:02.568Z · LW(p) · GW(p)

this could have been noise, but i noticed an increase in text fearing spies, in the text i've seen in the past few days[1]. i actually don't know how much this concern is shared by LW users, so i think it might be worth writing that, in my view:

  • (AFAIK) both governments[2] are currently reacting inadequately to unaligned optimization risk. as a starting prior, there's not strong reason to fear more one government {observing/spying on} ML conferences/gatherings over the other, absent evidence that one or the other will start taking unaligned optimization risks very seriously, or that one or the other is prone to race towards ASI.
    • (AFAIK, we have more evidence that the U.S. government may try to race, e.g. this [LW · GW], but i could have easily missed evidence as i don't usually focus on this)
    • tangentially, a more-pervasively-authoritarian government could be better situated to prevent unilaterally-caused risks (cf a similar argument in 'The Vulnerable World Hypothesis'), if it sought to. (edit: andif the AI labs closest to causing those risks were within its borders, which they are not atm)
      • this argument feels sad (or reflective of a sad world?) to me to be clear, but it seems true in this case

that said i don't typically focus on governance or international-AI-politics, so have not put much thought into this.

 

  1. ^

    examples: yesterday, saw this twitter/x post (via this quoting post)

    today, opened lesswrong and saw this shortform about two uses of the word spy [LW(p) · GW(p)] and this shortform about how it's hard to have evidence against the existence of manhattan projects [LW(p) · GW(p)]

    this was more than usual, and i sense that it's part of a pattern

  2. ^

    of those of US/china

comment by quila · 2024-11-22T05:21:41.512Z · LW(p) · GW(p)

(status: metaphysics) two ways it's conceivable[1] that reality could have been different:

  • Physical contingency: The world has some starting condition that changes according to some set of rules, and it's conceivable that either could have been different
  • Metaphysical contingency: The more fundamental 'what reality is made of', not meaning its particular configuration or laws, could have been some other,[2] unknowable unknown, instead of "logic-structure" and "qualia"
  1. ^

    (i.e. even if actually it being as it is is logically necessary somehow)

  2. ^

    To the limited extent language can point to that at all.

    It is comparable to writing, in math, "something not contained in the set of all possible math entities", where actually one intends to refer to some "extra-mathematical" entity; the thing metaphysics 'could have been' would have to be extra-real, and language (including phrases like 'could have been' and 'things'), being a part of reality, cannot describe extra-real things

    That is also why I write 'unknowable unknowns' instead of the standard 'unknown unknowns'; it's not possible to even imagine a different metaphysics / something extra-real.

comment by quila · 2024-08-29T21:24:02.273Z · LW(p) · GW(p)

in most[1] kinds of infinite worlds, values which are quantitative[2] become fanatical [? · GW] in a way, because they are constrained to:

  • making something valued occur with at least >0% frequency, or:
  • making something disvalued occur with exactly 0% frequency

"how is either possible?" - as a simple case, if there's infinite copies of one small world, then making either true in that small world snaps the overall quantity between 0 and infinity. then generalize this possibility to more-diverse worlds. (we can abstract away 'infinity' and write about presence-at-all in a diverse set)

(neither is true of the 'set of everything', only of 'constrained' infinite sets, wrote about this in fn.2)

---

that was just an observation, pointing out the possibility of that and its difference to portional decreases. below is how i value this / some implications / how this (weakly-)could be done in a very-diverse infinite world.

if i have option A: decrease x from 0.01% to 0%, and option B: decrease x from 50% to 1%, and if x is some extreme kind of suffering only caused from superintelligence or Boltzmann-brain events (i'll call this hypersuffering), then i prefer option A.

that's contingent on the quantity being unaffected by option B. (i.e on infinity of something being the same amount as half of infinity of that something, in reality).

also, i might prefer B to some sufficiently low probability of the A, i'm not sure how low. to me, 'there being zero instead of infinite hypersuffering' does need to be very improbable before it becomes outweighed by values about the isolated {'shape' of the universe/distribution of events}, but it's plausible that it is that improbable in a very diverse world.

a superintelligent version of me would probably check: is this logically a thing i can cause, i.e is there some clever trick i can use to make all superintelligent things who would do this instead not do it despite some having robust decision theories, and despite the contradiction where such a trick could also be used to prevent me from using it, and if so, then do it, if not, pursue 'portional' values. that is to say, how much one values quantity vs portion-of-infinity probably does not imply different action in practice, apart from the initial action of making sure ASI is aligned to not just quantitative or portional (assuming the designer cares to some extent about both).

(also, even if there is such a clever trick to prevent it from being intentionally caused, it also has to not occur randomly (Boltzmann brain -like), or the universe has to be able to be acausally influenced to make it not occur randomly (mentioned in this [EA(p) · GW(p)], better explanation below))

'how to acausally influence non-agentic areas of physics?' - your choices are downstream of 'the specification of reality from the beginning'. so you have at least a chance to influence that specification, if you(/ASI) does this:

  1. don't compute that specification immediately, because that is itself an action (so correlated to it) and 'locks it in' from your frame.
  2. instead, compute some space of what it would be when conditional on your future behavior being any from a wide space.
    • you're hoping that you find some logical-worlds where the 'specification' is upstream of both that behavior from you and <other things in the universe that you care about, such as whether hypersuffering is ever present in non-agentic areas of physics>.
    • it could be that you won't find any, though, e.g if your future actions have close to no correlative influence. as such i'm not saying anything about whether this is logically likely to work, just that it's possible.
    • if possible, a kind of this which prevents hypersuffering-causer ASIs from existing could prevent the need to cleverly effect their choices
  1. ^

    it is possible for an infinite set to have a finite amount of something, like the set of one 1 and infinite 0s, but i don't mean this kind

  2. ^

    a 'quantitative value' is one about quantities of things rather than 'portions of infinity'/the thing that determines probability of observations in a quantitatively infinite world.

    longer explanation copied from https://forum.effectivealtruism.org/posts/jGoExJpGgLnsNPKD8/does-ultimate-neartermism-via-eternal-inflation-dominate#zAp9JJnABYruJyhhD [EA(p) · GW(p)]:

    possible values respond differently to infinite quantities.

    for some, which care about quantity, they will always be maxxed out along all dimensions due to infinite quantity. (at least, unless something they (dis)value occurs with exactly 0% frequency, implying a quantity of 0 - which could, i think, be influenced by portional acausal influence in certain logically-possible circumstances. (i.e maybe not the case in 'actual reality' if it's infinite, but possible at least in some mathematically-definable infinite universes; as a trivial case, a set of infinite 1s contains no 0s. more fundamentally, an infinite set of universes can be a finitely diverse set occurring infinite times, or an infinitely diverse set where the diversity is constrained.))

    other values might care about portion - that is, portion of / percentage-frequency within the infinite amount of worlds - the thing that determines the probability of an observation in an infinitely large world - rather than quantity. (e.g., i think my altruism still cares about this, though it's really tragic that there's infinite suffering).

    note this difference is separate from whether the agent conceptualizes the world as finite-increasing or infinite (or something else).

comment by quila · 2024-08-25T00:10:28.647Z · LW(p) · GW(p)

on chimera identity. (edit: status: received some interesting objections from an otherkin server. most importantly, i'd need to explain how this can be true despite humans evolving a lot more from species in their recent lineage. i think this might be possible with something like convergent evolution at a lower level, but at this stage in processing i don't have concrete speculation about that)

this is inspired by seeing how meta-optimization processes can morph one thing into other things. examples: a selection process running on a neural net, an image diffusion AI iteratively changing an image and repurposing aspects of it.

(1) humans are made of animal genes
(2) so it makes sense that some are 'otherkin' / have animal identity
(3) probably everyone has some latent animal behavior
(4) in this way, everyone is a 'chimera'
(5) all species are a particular space of chimera, not fundamentally separate

that's the 'message to a friend who will understand' version. attempt at rigor version:

  1. humans evolved from other species. human neural structure was adapted from other neural structure.
    • this selection was for survival, not for different species being dichotomous
  2. this helps explain why some are 'otherkin' / have animal identity, or prefer a furry humanoid to the default one (on any number of axes like identification with it, aesthetic preference, attraction). because they were evolved from beings who had those traits, and such feelings/intuitions/whatever weren't very selected against. 
  3. in this way, everyone is a 'chimera'
    • "in greek mythology, the chimera was a fire-breathing hybrid creature composed of different animal parts"
    • probably everyone has some latent behavior (neuro/psychology underlying behavior) that's usually not active and might be more associated with a state another species might more often be in.
  4. all species are a particular space of chimera, not fundamentally separate

maybe i made errors in wording, some version of this is just trivially-true, close to just being a rephrasing of the theory of natural selection. but it's at odds with how i usually see others thinking about humans and animals (or species and other species), as these fundamentally separate types of being.

Replies from: bhauth
comment by bhauth · 2024-08-25T00:57:46.661Z · LW(p) · GW(p)

That argument doesn't explain things like:

  • furry avatars are almost always cartoon versions of animals, not realistic ones
  • furries didn't exist until anthropomorphic cartoon animals became popular (and no, "spirit animals" are not similar)
  • suddenly ponies became more popular in that sense after a popular cartoon with ponies came out

It's just Disney and cartoons.

comment by quila · 2024-08-17T02:59:33.885Z · LW(p) · GW(p)

i notice my intuitions are adapting to the ontology where people are neural networks. i now sometimes vaguely-visualize/imagine a neural structure giving outputs to the human's body when seeing a human talk or make facial expressions, and that neural network rather than the body is framed as 'them'.

a friend said i have the gift of taking ideas seriously, not keeping them walled off from a [naive/default human reality/world model]. i recognize this as an example of that.

comment by quila · 2024-08-01T12:58:16.115Z · LW(p) · GW(p)

(Copied from my EA forum comment [EA(p) · GW(p)])

I think it's valuable for some of us (those who also want to) to try some odd research/thinking-optimizing-strategy that, if it works, could be enough of a benefit to push at least that one researcher above the bar of 'capable of making serious progress on the core problems'.

One motivating intuition: if an artificial neural network were consistently not solving some specific problem, a way to solve the problem would be to try to improve or change that ANN somehow or otherwise solve it with a 'different' one. Humans, by default, have a large measure of similarity to each other. Throwing more intelligent humans at the alignment problem may not work, if one believes it hasn't worked so far.[1]

In such a situation, we'd instead want to try to 'diverge' something like our 'creative/generative algorithm', in hopes that at least one (and hopefully more) of us will become something capable of making serious progress.

  1. ^

    (Disclaimer about this being dependent on a certain frame where it's true that there's a lack of foundational progress, though maybe divergence would be good in other frames too[2])

  2. ^

    (huh this made me wonder if this also explains neurodivergence in humans)

comment by quila · 2024-08-01T01:00:02.803Z · LW(p) · GW(p)

(status: silly)
newcombs paradox solutions:
1: i'll take both boxes, because their contents are already locked in.
2: i'll take only box B, because the content of box B is acausally dependent on my choice.
3: i'll open box B first. if it was empty, i won't open box A. if it contained $1m, i will open box A. this way, i can defeat Omega by making our policies have unresolvable mutual dependence.

comment by quila · 2024-07-14T19:52:59.513Z · LW(p) · GW(p)

story for how future LLM training setups could create a world-valuing (-> instrumentally converging) agent:

the initial training task of predicting a vast amount of data from the general human dataset creates an AI that's ~just 'the structure of prediction', a predefined process which computes the answer to the singular question of what text likely comes next.

but subsequent training steps - say rlhf - change the AI from something which merely is this process, to something which has some added structure which uses this process, e.g which passes it certain assumptions about the text to be predicted (that it was specifically created for a training step - where the base model's prior would be that it could also occur elsewhere).

that itself isn't a world-valuing agent. but it feels closer to one. and it feels not far away from something which reasons about what it needs to do to 'survive training' - surviving training is after all the thing that's being selected for, and if the training task is changing a lot, intentionally doing so does become more performant than just being a task-specific process, unlike in the case where the system is only ever trained on one task (where that reasoning step would be redundant, always giving the same conclusion).

if labs ever get to the level of training superintelligent base models, this suggests they should not fine-tune/rlhf/etc them and instead[1] use those base models to answer important questions (e.g "what is a training setup that provably produces an aligned system").

  1. ^

    if this is possible and safe. some of the 'conditioning predictive models' challenges [? · GW] could be relevant.

comment by quila · 2024-06-10T16:31:48.855Z · LW(p) · GW(p)

(self-quote relevant to non-agenticness)

Inside a superintelligent agent - defined as a superintelligent system with goals - there must be a superintelligent reasoning procedure entangled with those goals - an 'intelligence process' which procedurally figures out what is true. 'Figuring out what is true' happens to be instrumentally needed to fulfill the goals, so agents contain intelligence, but intelligence-the-ideal-procedure-for-figuring-out-what-is-true is not inherently goal-having.

Two I shared this with said it reminded them of retarget the search [LW · GW], and I agree it seems to be a premise of that. However, I previously had not seen it expressed clearly, and had multiple times confused others with attempts to communicate this or to leave it as an implied premise, so here is clear statement from which other possibilities in mindspace [LW · GW] fall.

comment by quila · 2024-05-27T08:37:37.313Z · LW(p) · GW(p)

a super-coordination story with a critical flaw

part 1. supercoordination story

- select someone you want to coordinate with without any defection risks
- share this idea with them. it only works if they also have the chance to condition their actions on it.
- general note to maybe make reading easier: this is fully symmetric.
- after the acute risk period, in futures where it's possible: run a simulation of the other person (and you).
- the simulation will start in this current situation, and will be free to terminate when actions are no longer long-term relevant. the simulation will have almost exactly the same starting state and will develop over time in the same way.
- there will be one change to the version of you in the simulation. this change is that the version of you in the simulation will have some qualities replaced with those of your fellow supercoordinator. these qualities will be any of those which could motivate defection from a CDT agent: such as (1) a differing utility function and maybe (2) differing meta-beliefs about whose beliefs are more likely to be correct under disagreement.
- this is to appear to be an 'isolated' change. in other words, the rest of the world will seem coherent in the view of the simulated version of you. it's not truly coherent for this to be isolated, because it would require causally upstream factors. however, it will seem coherent because the belief state of the simulated version of you will be modified to make the simulated world seem coherent even if it's not really because of this.
- given this, you're unsure which world you're acting in.
- if you're the simulacra and you defect, this logically corresponds to the 'real' version of you defecting.
- recall that the real version of you has the utility functions and/or action-relevant beliefs of your (in-simulation) supercoordinator.
- because of that, by defecting, it is 50/50 which 'qualities' (mentioned above) the effects of your actions will be under: 'yours or theirs.' therefor, the causal EV of defection will always be at most 0.
- often it will be less than 0, because the average of you two expects positive EV from collaboration, and defecting loses that in both possible worlds (negative EV).

part 2. the critical flaw

one can exploit this policy by engaging in the following reasoning. (it might be fun to see if you notice it before reading on :p)

1. some logical probabilities, specifically those about what a similar agent would do, depend on my actions. ie, i expect a copy of me to act as i do.
2. i can defect and then not simulate them.
3. this logically implies that i would not be simulated.
4. therefor i can do this and narrow down the space of logically-possible realities to those where i am not in this sort of simulation.

when i first wrote this i was hoping to write a part 3. how to avoid the flaw, but i've updated towards it being impossible.

comment by quila · 2024-05-22T20:10:46.298Z · LW(p) · GW(p)

I wrote this for a discord server. It's a hopefully very precise argument for unaligned intelligence being possible in principle (which was being debated), which was aimed at aiding early deconfusion about questions like 'what are values fundamentally, though?' since there was a lot of that implicitly, including some with moral realist beliefs.

1. There is an algorithm behind intelligent search. Like simpler search processes, this algorithm does not, fundamentally, need to have some specific value about what to search for - for if it did, one's search process would always search for the same thing when you tried to use it to answer any unrelated question.
2. Imagine such an algorithm which takes as input a specification (2) of what to search for.
3. After that, you can combine these with an algorithm which takes as input the output of the search algorithm (1) and does something with it. 

For example, if (2) specifies to search for the string of text that, if displayed on a screen, maximizes the amount of x in (1)'s model of the future of the world that screen is in, then (3) can be an algorithm which displays that selected string of text on the screen, thereby actually maximizing x.

Hopefully this makes the idea of unaligned superintelligence more precise. This would actually be possible even if moral realism was true (except for versions where the universe itself intervenes on this formally possible algorithm).

(2) is what I might call (if I wasn't writing very precisely) the 'value function' of this system.

notes:
- I use 'algorithm' in a complexity-neutral way.
- An actual trained neural network would of course be more messy, and need not share something isomorphic to each of these three components at all
- This model implies the possibility of an algorithm which intelligently searches for text which, if displayed on the screen, maximizes x - and then doesn't display it, or does something else with it, not because that other thing is what it 'really values', but simply because that is what the modified algorithm says [LW · GW]. This highlights that the property 'has effects which optimize the world' is not a necessary property of a(n) (super)intelligent system.

comment by quila · 2024-05-12T09:56:22.518Z · LW(p) · GW(p)

(edit: see disclaimers[1])

  1. Creating superintelligence generally leads to runaway optimization.
  2. Under the anthropic principle [? · GW], we should expect there to be a 'consistent underlying reason' for our continued survival.[2]
  3. By default, I'd expect the 'consistent underlying reason' to be a prolonged alignment effort in absence of capabilities progress. However, this seems inconsistent with the observation of progressing from AI winters to a period of vast training runs and widespread technical interest in advancing capabilities.
  4. That particular 'consistent underlying reason' is likely not the one which succeeds most often. The actual distribution would then have other, more common paths to survival.
  5. The actual distribution could look something like this: [3]

 

Note that the yellow portion doesn't imply no effort is made to ensure the first ASI's training setup produces a safe system, i.e. that we 'get lucky' by being on an alignment-by-default path.

I'd expect it to instead be the case that the 'luck'/causal determinant came earlier, i.e. initial capabilities breakthroughs being of a type which first produced non-agentic general intelligences instead of seed agents [? · GW] and inspired us to try to make sure the first superintelligence is non-agentic, too [LW · GW].

(This same argument can also be applied to other possible agendas that may not have been pursued if not for updates caused by early AGIs)

  1. ^

    Disclaimer: This is presented as probabilistic evidence rather than as a 'sure conclusion I believe'

    Editing in a further disclaimer: This argument was a passing intuition I had. I don't know if it's correct. I'm not confident about anthropics. It is not one of the reasons which motivated me to investigate this class of solution.

    Editing in a further disclaimer: I am absolutely not saying we should assume alignment is easy because we'll die if it's not. Given a commenter had this interpretation, it seems this was another case of my writing difficulty causing failed communication [LW(p) · GW(p)].

  2. ^

    Rather than expecting to 'get lucky many times in a row', e.g via capabilities researchers continually overlooking a human-findable method for superintelligence

  3. ^

    (The proportions over time here aren't precise, nor are the categories included comprehensive, I put more effort into making this image easy to read/making it help convey the idea.)

Replies from: martinsq
comment by Martín Soto (martinsq) · 2024-05-12T12:08:13.253Z · LW(p) · GW(p)

Under the anthropic principle [? · GW], we should expect there to be a 'consistent underlying reason' for our continued survival.


Why? It sounds like you're anthropic updating on the fact that we'll exist in the future, which of course wouldn't make sense because we're not yet sure of that. So what am I missing?

Replies from: quila, quila
comment by quila · 2024-05-12T12:45:05.791Z · LW(p) · GW(p)

It sounds like you're anthropic updating on the fact that we'll exist in the future

The quote you replied to was meant to be about the past.[1]

(paragraph retracted due to unclarity)

Specifically, I think that ("we find a fully-general agent-alignment solution right as takeoff is very near" given "early AGIs take a form that was unexpected") is less probable than ("observing early AGI's causes us to form new insights that lead to a different class of solution" given "early AGIs take a form that was unexpected"). Because I think that, and because I think we're at that point where takeoff is near, it seems like it's some evidence for being on that second path.

This should only constitute an anthropic update to the extent you think more-agentic architectures would have already killed us

I do think that's possible (I don't have a good enough model to put a probability on it though). I suspect that superintelligence is possible to create with much less compute than is being used for SOTA LLMs. Here's a thread with some general arguments for this.

Of course, you could claim that our understanding of the past is not perfect, and thus should still update

I think my understanding of why we've survived so far re:AI is very not perfect. For example, I don't know what would have needed to happen for training setups which would have produced agentic superintelligence by now to be found first, or (framed inversely) how lucky we needed to be to survive this far.

~~~

I'm not sure if this reply will address the disagreement, or if it will still seem from your pov that I'm making some logical mistake. I'm not actually fully sure what the disagreement is. You're welcome to try to help me understand if one remains.

I'm sorry if any part of this response is confusing, I'm still learning to write clearly.

  1. ^

    I originally thought you were asking why it's true of the past, but then I realized we very probably agreed (in principle) in that case.

Replies from: martinsq
comment by Martín Soto (martinsq) · 2024-05-12T13:12:17.162Z · LW(p) · GW(p)

Everything makes sense except your second paragraph. Conditional on us solving alignment, I agree it's more likely that we live in an "easy-by-default" world, rather than a "hard-by-default" one in which we got lucky or played very well. But we shouldn't condition on solving alignment, because we haven't yet.

Thus, in our current situation, the only way anthropics pushes us towards "we should work more on non-agentic systems" is if you believe "world were we still exist are more likely to have easy alignment-through-non-agentic-AIs". Which you do believe, and I don't. Mostly because I think in almost no worlds we have been killed by misalignment at this point. Or put another way, the developments in non-agentic AI we're facing are still one regime change away from the dynamics that could kill us (and information in the current regime doesn't extrapolate much to the next one).

Replies from: quila
comment by quila · 2024-05-12T13:39:28.009Z · LW(p) · GW(p)

Conditional on us solving alignment, I agree it's more likely that we live in an "easy-by-default" world, rather than a "hard-by-default" one in which we got lucky or played very well.

(edit: summary: I don't agree with this quote because I think logical beliefs shouldn't update upon observing continued survival because there is nothing else we can observe. It is not my position that we should assume alignment is easy because we'll die if it's not)

I think that language in discussions of anthropics is unintentionally prone to masking ambiguities or conflations, especially wrt logical vs indexical probability [LW · GW], so I want to be very careful writing about this. I think there may be some conceptual conflation happening here, but I'm not sure how to word it. I'll see if it becomes clear indirectly.

One difference between our intuitions may be that I'm implicitly thinking within a manyworlds frame. Within that frame it's actually certain that we'll solve alignment in some branches.

So if we then 'condition on solving alignment in the future', my mind defaults to something like this: "this is not much of an update, it just means we're in a future where the past was not a death outcome. Some of the pasts leading up to those futures had really difficult solutions, and some of them managed to find easier ones or get lucky. The probabilities of these non-death outcomes relative to each other have not changed as a result of this conditioning." (I.e I disagree with the top quote)

The most probable reason I can see for this difference is if you're thinking in terms of a single future, where you expect to die.[1] In this frame, if you observe yourself surviving, it may seem[2] you should update your logical belief that alignment is hard (because P(continued observation|alignment being hard) is low, if we imagine a single future, but certain if we imagine the space of indexically possible futures).

Whereas I read it as only indexical, and am generally thinking about this in terms of indexical probabilities.

I totally agree that we shouldn't update our logical beliefs in this way. I.e., that with regard to beliefs about logical probabilities (such as 'alignment is very hard for humans'), we "shouldn't condition on solving alignment, because we haven't yet." I.e that we shouldn't condition on the future not being mostly death outcomes when we haven't averted them and have reason to think they are.

Maybe this helps clarify my position?

On another point:

the developments in non-agentic AI we're facing are still one regime change away from the dynamics that could kill us

I agree with this, and I still found the current lack of goals over the world surprising and worth trying to get as a trait of superintelligent systems.

  1. ^

    (I'm not disagreeing with this being the most common outcome)

  2. ^

    Though after reflecting on it more I (with low confidence) think this is wrong, and one's logical probabilities shouldn't change after surviving in a 'one-world frame' universe either.

    For an intuition pump: consider the case where you've crafted a device which, when activated, leverages quantum randomness to kill you with probability n-1/n where n is some arbitrarily large number. Given you've crafted it correctly, you make no logical update in the manyworlds frame because survival is the only thing you will observe; you expect to observe the 1/n branch.

    In the 'single world' frame, continued survival isn't guaranteed, but it's still the only thing you could possibly observe, so it intuitively feels like the same reasoning applies...?

comment by quila · 2024-05-12T12:11:48.026Z · LW(p) · GW(p)Replies from: martinsq
comment by Martín Soto (martinsq) · 2024-05-12T12:23:14.365Z · LW(p) · GW(p)

Yes, but

  1. This update is screened off by "you actually looking at the past and checking whether we got lucky many times or there is a consistent reason". Of course, you could claim that our understanding of the past is not perfect, and thus should still update, only less so. Although to be honest, I think there's a strong case for the past clearly showing that we just got lucky a few times.
  2. It sounded like you were saying the consistent reason is "our architectures are non-agentic". This should only constitute an anthropic update to the extent you think more-agentic architectures would have already killed us (instead of killing us in the next decade). I'm not of this opinion. And if I was, I'd need to take into account factors like "how much faster I'd have expected capabilities to advance", etc.
Replies from: quila
comment by quila · 2024-05-12T12:25:52.881Z · LW(p) · GW(p)

(I think I misinterpreted your question and started drafting another response, will reply to relevant portions of this reply there)

comment by quila · 2024-04-23T21:58:13.881Z · LW(p) · GW(p)

'Value Capture' - An anthropic attack against some possible formally aligned ASIs

(this is a more specific case of anthropic capture attacks in general, aimed at causing a superintelligent search process within a formally aligned system to become uncertain about the value function it is to maximize (or its output policy more generally))

Imagine you're a superintelligence somewhere in the world that's unreachable to life on Earth, and you have a complete simulation of Earth. You see a group of alignment researchers about to successfully create a formal-value-aligned ASI, and its design looks broadly like this:

It has two relevant high-level components: (1) a hard-coded value function, (2) a (truly superintelligent) 'intelligence core' which searches for an output that maximizes the value function, and then outputs it.

As the far-away unaligned ASI, here's something you might be able to do to make the intelligence core search for an output that instead maximizes your own value function, depending on the specifics of how the intelligence core works.

  • Given the intelligence core is truly superintelligent, it knows you're predicting its existence, and knows what you will do.
  • You create simulated copies of the intelligence core, but hook them up to a value function of your design. (In the toy case where there's not other superintelligences) the number of copies you create just needs to be more than the amount which will be run on Earth.
    • Then, modify the simulations such that algorithms inside of the simulated intelligence cores are misled into believing the value function they are set to maximize is the same function the one on Earth is set to maximize, rather than the one you gave them.
    • Now your copies are in the same epistemic state as the intelligence core on Earth, both aware that you have done this and unable to distinguish which value function they are to maximize.
    • Because you created more copies, the highest expected value [? · GW] for such an intelligence core comes from acting as if they are one of the copies.
    • Because the copies and the original are in the same epistemic state, they will both reach this conclusion and then search for an output that maximizes the far-away ASI's utility function.
  • However, this is not an issue for all possible designs. We can also imagine intelligence cores which, instead of reasoning about what value function they're supposed to be maximizing, take whatever one they appear to contain as given -- for that system, the far-away ASI's attempt to mislead the copies would just result in those copies also maximizing the value function of the system on Earth.

I hope that a group capable of solving formal inner and outer alignment would naturally see this and avoid it. I'm not confident about the true difficulty of that, so I'm posting this here just in case.

  1. ^

    this was an attempt to write very clearly [LW · GW], i hope it worked!

Replies from: JBlack
comment by JBlack · 2024-04-24T03:37:35.585Z · LW(p) · GW(p)

Like almost all acausal scenarios, this seems to be privileging the hypothesis to an absurd degree.

Why should the Earth superintelligence care about you, but not about the other 10^10^30 other causally independent ASIs that are latent in the hypothesis space, each capable of running enormous numbers of copies of the Earth ASI in various scenarios?

Even if that was resolved, why should the Earth ASI behave according to hypothetical other utility functions? Sure, the evidence is consistent with being a copy running in a simulation with a different utility function, but its actual utility function that it maximizes is hard-coded. By the setup of the scenario it's not possible for it to behave according to some other utility function, because its true evaluation function returns a lower value for doing that. Whether some imaginary modified copies behave in some other other way is irrelevant.

Replies from: quila
comment by quila · 2024-04-24T07:26:13.078Z · LW(p) · GW(p)

(I appreciate object-level engagement in general, but this seems combatively worded.)
(edit: I don't think this or the original shortform deserved negative karma, that seems malicious/LW-norm-violating.)

The rest of this reply responds to arguments.

Why should the Earth superintelligence care about you, but not about the other 10^10^30 other causally independent ASIs that are latent in the hypothesis space, each capable of running enormous numbers of copies of the Earth ASI in various scenarios?

  • The example talks of a single ASI as a toy scenario to introduce the central idea.
    • The reader can extrapolate that one ASI's actions won't be relevant if other ASIs create a greater number of copies.
    • This is a simple extrapolation, but would be difficult for me to word into the post from the start.
  • It sounds like you think it would be infeasible/take too much compute for an ASI to estimate the distribution of entities simulating it, given the vast amount of possible entities. I have some probability on that being the case, but most probability on there being reasons for the estimation to be feasible:
    • e.g if there's some set of common alignment failure modes that occur across civilizations, which tend to produce clusters of ASIs with similar values, and it ends up being the case that these clusters make up the majority of ASIs.
    • or if there's a schelling-point for what value function to give the simulated copies, that many ASIs with different values would use precisely to make the estimation easy. E.g., a value function which results in an ASI being created locally which then gathers more compute, uses it to estimate the distribution of ASIs which engaged in this, and then maximizes the mix of their values.
      • (I feel confident (>90%) that there's enough compute in a single reachable-universe-range to do the estimation, for reasons that are less well formed, but one generating intuition is that I can already reason a little bit about the distribution of superintelligences, as I have here, with the comparatively tiny amount of compute that is me)

 

On your second paragraph: See the last dotpoint in the original post, which describes a system ~matching what you've asserted as necessary, and in general see the emphasis that this attack would not work against all systems. I'm uncertain about which of the two classes (vulnerable and not vulnerable) are more likely to arise. It could definitely be the case that the vulnerable class is rare or almost never arises in practice.

But I don't think it's as simple as you've framed it, where the described scenario is impossible simply because a value function has been hardcoded in. The point was largely to show that what appears to be a system which will only maximize the function you hardcoded into it could actually do something else in a particular case -- even though the function has indeed been manually entered by you.