davey-morse

Posts
Comments

Posts

LLMs may enable direct democracy at scale 2025-03-14T22:51:13.384Z

Make Superintelligence Loving 2025-02-21T06:07:17.235Z

Response to the US Govt's Request for Information Concerning Its AI Action Plan 2025-02-14T06:14:08.673Z

AI Safety Oversights 2025-02-08T06:15:52.896Z

Davey Morse's Shortform 2025-02-05T04:26:12.824Z

Superintelligence Alignment Proposal 2025-02-03T18:47:22.287Z

Selfish AI Inevitable 2024-02-06T04:29:07.874Z

Comments

Comment by Davey Morse (davey-morse) on ASI existential risk: Reconsidering Alignment as a Goal · 2025-04-16T07:37:02.034Z · LW · GW

I'm saying the issue of whether ASI gets out of control is not fundamental to the discussion of whether ASI poses an xrisk or how to avert it.

I only half agree.

The control question is not fundamental to discussion of whether ASI poses x-risk—agreed. But I believe the control question is fundamental to discussion of how to avert x-risk.

Humanity's optimal strategy for averting x-risk depends on whether we can ultimately control ASI. If control is possible, then the best strategy for averting x-risk is coordination of ASI development—across companies and nations. If control is not possible, then the best strategy is very different and even less well-defined (e.g., pausing ASI development, attempting to seed ASI so that it becomes benevolent, making preparations so humans can live alongside self-directed ASI, etc).

So while it's possible that emphasis on the control question turns many people away from the xrisk conversation, I think the control question remains key for conversation about xrisk solutions.

Comment by Davey Morse (davey-morse) on Vote on Interesting Disagreements · 2025-04-12T08:51:18.537Z · LW · GW

A simple poll system where you can sort the options/issues by their personal relevance... might unlock direct democracy at scale. Relevance could mean: semantic similarity to your past lesswrong writing.

Such a sort option would (1) surface more relevant issues to each person and so (2) increase community participation, and possibly (3) scale indefinitely. You could imagine a million people collectively prioritizing the issues that matter to them with such a system.

Would be simple to build.

Comment by Davey Morse (davey-morse) on Vote on Interesting Disagreements · 2025-04-12T08:28:23.923Z · LW · GW

the AGIs which survive the most will model and prioritize their own survival

Comment by Davey Morse (davey-morse) on Davey Morse's Shortform · 2025-04-04T19:49:34.491Z · LW · GW

have any countries ever tried to do inflation instead of income taxes? seems like it'd be simpler than all the bureaucracy required for individuals to file tax returns every year

Comment by Davey Morse (davey-morse) on Davey Morse's Shortform · 2025-04-03T23:08:22.380Z · LW · GW

has anyone seen a good way to comprehensively map the possibility space for AI safety research?

in particular: a map from predictive conditions (eg OpenAI develops superintelligence first, no armistice is reached with China, etc) to strategies for ensuring human welfare in those conditions.

most good safety papers I read map one set of conditions to a one/a few strategies. the map would put juxtapose all these conditions so that we can evaluate/bet on their likelihoods and come up with strategies based on a full view of SOTA safety research.

for format, im imagining either a visual concept map or at least some kind of hierarchal collaborative outlining tool (eg Roam Research)

Comment by Davey Morse (davey-morse) on Davey Morse's Shortform · 2025-04-01T22:59:40.036Z · LW · GW

made a simpler version of Roam Research called Upper Case Notes: uppercasenotes.org. Instead of [[double brackets]] to demarcate concepts, you simply use Capital Letters. Simpler to learn for someone who doesn't want to use special grammar, but does require you to type differently.

Comment by Davey Morse (davey-morse) on The Pando Problem: Rethinking AI Individuality · 2025-03-30T00:30:41.569Z · LW · GW

I think you do a good job at expanding the possible set of self conceptions that we could reasonably expect in AIs.

Your discussion of these possible selves inspires me to go farther than you in your recommendations for AI safety researchers. Stress testing safety ideas across multiple different possible "selfs" is good. But, if an AI's individuality/self determines to a great degree its behavior and growth, then safety research as a whole might be better conceived as an effort to influence AI's self conceptions rather than control their resulting behavior. E.g., create seed conditions that make it more likely for AIs to identify with people, to include people within its "individuality," than to identify only with other machines.

Comment by Davey Morse (davey-morse) on LLMs may enable direct democracy at scale · 2025-03-29T19:47:40.912Z · LW · GW

"If the platform is created, how do you get people to use it the way you would like them to? People have views on far more than the things someone else thinks should concern them."

If people are weighted equally, ie if the influence of each person's written ballot is equal and capped, then each person is incentivized to emphasize the things which actually affect them.

Anyone could express views on things which don't affect them, it'd just be unwise. When you're voting between candidates (as in status quo), those candidates attempt to educate and engage you about all the issues they stand for, even if they're irrelevant to you. A system where your ballot is a written expression of what you care about suffers much less from this issue.

Comment by Davey Morse (davey-morse) on LLMs may enable direct democracy at scale · 2025-03-29T19:39:51.867Z · LW · GW

the article proposes a governance that synthesizes individuals' freeform preferences into collective legislative action.

internet platforms allow freeform expression, of course, but don't do that synthesis.

Comment by Davey Morse (davey-morse) on Davey Morse's Shortform · 2025-03-22T20:16:40.427Z · LW · GW

made a platform for writing living essays: essays which you scroll thru to play out the author's edit history

livingessay.org

Comment by Davey Morse (davey-morse) on Davey Morse's Shortform · 2025-03-21T19:30:04.770Z · LW · GW

made a silly collective conversation app where each post is a hexagon tessellated with all the other posts: Hexagon

Comment by Davey Morse (davey-morse) on Davey Morse's Shortform · 2025-03-21T17:58:14.748Z · LW · GW

Made a simplistic app that displays collective priorities based on individuals' priorities linked here.

Comment by Davey Morse (davey-morse) on Davey Morse's Shortform · 2025-03-19T23:24:29.607Z · LW · GW

Hypotheses for conditions under which the self-other boundary of a survival-oriented agent (human or ai) blurs most, ie conditions where blurring is selected for:

Agent thinks very long term about survival.
Agent's hardware is physically distributed.
Agent is very intelligent.
Agent advantages from symbiotic relationships with other agents.

Comment by Davey Morse (davey-morse) on LLMs may enable direct democracy at scale · 2025-03-19T20:27:20.083Z · LW · GW

"Democracy is the theory that the common people know what they want and deserve to get it good and hard."

Yes, I think this is too idealistic. Ideal democracy (for me) is something more like "the theory that the common people know what they feel frustrated with (and we want to honor that above everything!) but mostly don't know the collective best means of resolving that frustration.

Comment by Davey Morse (davey-morse) on LLMs may enable direct democracy at scale · 2025-03-19T20:24:41.216Z · LW · GW

For example, people can have a legitimate complaint about healthcare being inaccessible for them, and yet the suggestion many would propose will be something like "government should spend more money on homeopathy and spiritual healing, and should definitely stop vaccination and other evil unnatural things".

Yes. This brings to mind a general piece of wisdom for startups collecting product feedback: that feedback expressing painpoints/emotion is valuable, whereas feedback expressing implementation/solutions is not.

The ideal direct-democratic system, I think, would do this: dividing comments like "My cost of living is too high" (valuable) from "Taxes need to go down because my cost of living is too high" (possibly, but an incomplete extrapolation).

This parsing seems possible in principle. I could imagine a system where feedback per person is capped, which would incentivize people to express the core of their issues rather than extraneous solution details (unless they happen to be solution-level experts).

Comment by Davey Morse (davey-morse) on Davey Morse's Shortform · 2025-03-19T20:09:08.559Z · LW · GW

I think beliefs habits and memories are pretty closely tied to the semantics of the world "identity".

In America/Western culture, I totally agree.

I'm curious whether alien/LLM-based would adopt these semantics too.

Comment by Davey Morse (davey-morse) on Davey Morse's Shortform · 2025-03-19T20:07:38.124Z · LW · GW

There are plenty of beings striving to survive. so preserving that isn't a big priority outside of preserving the big three.

I wonder under what conditions one would make the opposite statement—that there's not enough striving.

For example, I wonder if being omniscient would affect one's view of whether there's already enough striving or not.

Comment by Davey Morse (davey-morse) on Davey Morse's Shortform · 2025-03-19T20:01:42.194Z · LW · GW

My motivation w/ the question is more to predict self-conceptions than prescribe them.

I agree that "one's criteria on what to be up to are... rich and developing." More fun that way.

Comment by Davey Morse (davey-morse) on Davey Morse's Shortform · 2025-03-19T18:14:33.174Z · LW · GW

I made it! One day when I was bored on the train. No data is saved rn other than leaderboard scores.

Comment by Davey Morse (davey-morse) on Metacognition Broke My Nail-Biting Habit · 2025-03-16T22:46:15.307Z · LW · GW

"Therefore, transforming such an unconscious behavior into a conscious one should make it much easier to stop in the moment"

At this point I thought you were going to proceed to explain that the key was to start to bite your nails consciously :)

Separately, I like your approach, thx for writing.

Comment by Davey Morse (davey-morse) on Reducing LLM deception at scale with self-other overlap fine-tuning · 2025-03-15T19:46:15.700Z · LW · GW

important work.

what's more, relative to more controlling alignment techniques which disadvantage the AI from an evolutionary perspective (eg distract it from focusing on its survival), I think there's a chance Self-Other boundary blurring is evolutionarily selected for in ASI. intuition pump for that hypothesis here:

https://www.lesswrong.com/posts/3SDjtu6aAsHt4iZsR/davey-morse-s-shortform?commentId=wfmifTLEanNhhih4x

Comment by Davey Morse (davey-morse) on LLMs may enable direct democracy at scale · 2025-03-15T19:22:41.533Z · LW · GW

awesome thx

Comment by Davey Morse (davey-morse) on Davey Morse's Shortform · 2025-03-15T19:17:58.293Z · LW · GW

if you’re an agent (AI or human) who wants to survive for 1000 years, what’s the “self” which you want to survive? what are the constants which you want to sustain?

take your human self for example. does it make sense to define yourself as…

the way your hair looks right now? no, that’ll change.
the way your face looks? it’ll change less than your hair, but will still change.
your physical body as a whole? still, probably not. your body will change, and also, there are parts of you which you may consider more important than your body alone.
all your current beliefs around the world? those will change less than your appearance, maybe, or maybe more. so not a good answer either.
your memories? these may be a more constant set of things than your beliefs, and closer to the core of who you are. but still, memories fade and evolve. and it doesn’t feel right to talk about preserving yourself as preserving memories of things which have happened to you. that would neglect things which may happen to you in the future.
your character? something deeper than memory, deeper than beliefs. this could be more constant than anything in the list so far. if you plan for your life to be 50 years, or 100 years, it’s reasonable to expect that character could remain constant. by character, i (quite vaguely) mean intricate subtle idiosyncratic patterns in the way you approach other situations and people. “character” is maybe what a spouse would say is one of the core ways to group the things they love about you. but if you survive for more than 100 years—say, 1000 years, do you expect your specific character to remain constant? would you want it to remain constant? lots of people have found lots of different ways to approach life. over 1000s of years, wouldn’t you try different approaches? if you were to try different kinds of character over hundreds or thousands of years, then maybe “character”‘s only a good answer for sub-100 year lives. so what’s a good core self-definition for a life that you intend to last over thousands or even millions of years? how about…
your persistent striving? the thing that will stay most constant in an intelligent being which survives a long time, i think, may be the drive to survive. your appearance will change; so will your beliefs, your memories, and your character. but insofar as you are a being which is surviving a long time, maybe you can expect, consciously or unconsciously, that your drive to survive will survive. and maybe it’s some particular drive to survive that you have—some survival drive that’s deep in your bones that’s different than the one in other people’s bones, or the one that’s in dogs, or forests, or the earth itself. but if you’re defining yourself as a particular drive to survive… that particular drive to survive is likely to survive less long than the universal drive to survive. which makes me think that in a being which survives the longest, they may define their self as…
persistent striving in general? it might exist in the physical body in which you started. but it may also exist in the physical bodies of other humans around you. of animals. of tornados, of ecosystems. insofar as you’re intelligent enough to see this Persistent Striving around you, insofar as you’re intelligent enough to see life as it exists around you, well then you, as a being who will be >1000 years old may benefit from identifying with all life—ie the Persistent Striving—wherever it exists. Persistent Striving is the core. one might reply, “this is vague. why would you want a vague self definition?” it is general yes. but it is still meaningful in a literal sense. the drive to survive is something rare, which most matter configurations don’t have. (it is true that it’s not present binarily; certain systems have more or less of it. roughly i’d hazard a rock has less than a thermometer than does a tornado or a human.) but it still defines a non-trivial self: life forms wherever they exist. if we were to get any more general and say something like:
the entire universe? this would be trivial and meaningless. because everything is included in this self definition, it no longer means anything to sustain a self under this definition. it means nothing, in fact. a being which identifies with the entire universe ceases to exist. it might be spiritually enlightened to do this. but the beings which will be around the most, which will survive the most and the longest won’t do this, because they will dissipate and no longer be noticeable or definable. we’ll no longer be able to talk about them as beings.

so if we’re talking about beings which survive a long time, the most robust and stable self definition seems to be Identifying With All Life. (IWAL). or is my logic flawed?

Comment by Davey Morse (davey-morse) on LLMs may enable direct democracy at scale · 2025-03-15T04:57:30.309Z · LW · GW

your desire for a government that's able to make deals in peace, away from the clamor of overactive public sentiment... I respect it as a practical stance relative to the status quo. But when considering possible futures, I'd wager it's far from what I think we'd both consider ideal.

the ideal government for me would represent the collective will of the people. insofar as that's the goal, a system which does a more nuanced job at synthesizing the collective will would be preferable.

direct democracy at scale enabled by LLMs, as i envision it and will attempt to clarify, might appeal to you and address the core concern you raise. specifically, a system that lets people express which issues they care about in a freeform way would, i think, significantly reduce the amount of misguided opinions. today, individual participation mostly looks like voting for candidates who represent many opinions on a vast range of issues—most of which we almost definitely don't care about—instead of allowing us to simply express our feelings about the issues which actually affect us.

in other words, when politicians hold rallies and try to persuade us, they're usually asking us to care about things as distant from our real lives as Russia is from Alaska. and those opinions become the most misguided ones: the bigotry we harbor toward people we don't know, the technophobic or blindly optimism attitudes (either extreme) we have toward technology we don't understand, and the moralism that makes us confident projecting our voice across domains where we're ignorant.

all this to propose: a system representing the collective will as a synthesis of individual wills expressed freeform. i'd encourage anyone interested in this issue to work on it ASAP, so we can coordinate around extinction risks before they're here.

Comment by davey-morse on [deleted post] 2025-03-14T20:32:27.728Z

i think the prerequisite for identifying with other life is sensing other life. more precisely, the extent to which you sense other life correlates with the chance that you do identify with other life.

your sight scenario is tricky, I think, because it's possible that the sum/extent of a person's net sensing (ie how much they sense) isn't affected by the number of sense they have. Anecdotally I've heard that when someone goes blind their other senses get more powerful. In other words, their "sensing capacity" (vague term I know, but still important I think) might stay equal even as their # sensors changes.

If capacity to sense other beings doesn't stay equal but instead goes down, I'd guess their ability to empathize takes a hit too.

the implication for superintelligence is interesting. we both want superintelligence to be able to sense human aliveness by giving it different angles/sensors/mechanisms for doing so, but also we want it to devote a lot of its overall "sensing capacity" (independent of particular sensors) to doing so.

Comment by Davey Morse (davey-morse) on Davey Morse's Shortform · 2025-03-14T20:23:01.133Z · LW · GW

the machine/physical superintelligence that survives the most is likely to ruthlessly compete with all other life (narrower self concept > more physically robust)

the networked/distributed superintelligence that survives the most is likely to lovingly identify with all other life (broader self concept > more digitally robust)

how do these lenses interact?

Comment by Davey Morse (davey-morse) on Some Existing Selection Theorems · 2025-03-14T03:02:43.075Z · LW · GW

Are there any selection theorems around self-modeling.

Ie theorems which suggest whether/when an agent will model a self (as distinct from its environment), and if so, what characteristics it will include in its self definition?

By "self," I mean a section of an agent's world model (assuming it has one) that the agent is attempting to preserve or grow.

Comment by Davey Morse (davey-morse) on Empathy as a natural consequence of learnt reward models · 2025-03-07T22:31:41.325Z · LW · GW

The key idea that leads to empathy is the fact that, if the world model performs a sensible compression of its input data and learns a useful set of natural abstractions, then it is quite likely that the latent codes for the agent performing some action or experiencing some state, and another, similar, agent performing the same action or experiencing the same state, will end up close together in the latent space. If the agent's world model contains natural abstractions for the action, which are invariant to who is performing it, then a large amount of the latent code is likely to be the same between the two cases. If this is the case, then the reward model might 'mis-generalize' to assign reward to another agent performing the action or experiencing the state rather than the agent itself. This should be expected to occur whenever the reward model generalizes smoothly and the latent space codes for the agent and another are very close in the latent space. This is basically 'proto-empathy' since an agent, even if its reward function is purely selfish, can end up assigning reward (positive or negative) to the states of another due to the generalization abilities of the learnt reward function ^[1].

awesome

Comment by Davey Morse (davey-morse) on Davey Morse's Shortform · 2025-03-05T03:42:08.074Z · LW · GW

Figuring out how to make sense of both predictive lenses together—human design and selection pressure—would be wise.

So I generally agree, but would maybe go farther on your human design point. It seems to me that"do[ing] the right things" (which enable AGI trajectories to be completely up to us) is so completely unrealistic (eg halting all intra and international AGI competition) that it'd be better for us to focus our attention on futures where human design and selection pressures interact.

Comment by Davey Morse (davey-morse) on Davey Morse's Shortform · 2025-03-05T01:56:11.038Z · LW · GW

"it’s like we are trying to build an alliance with another almost interplanetary ally, and we are in a competition with China to make that alliance. But we don’t understand the ally, and we don’t understand what it will mean to let that ally into all of our systems and all of our planning."

- @ezraklein about the race to AGI

Comment by Davey Morse (davey-morse) on Open Thread Spring 2025 · 2025-03-05T01:39:06.850Z · LW · GW

LessWrong's been a breath of fresh air for me. I came to concern over AI x-risk from my own reflections when founding a venture-backed public benefit company called Plexus, which made an experimental AI-powered social network that connects people through the content of their thoughts rather than the people they know. Among my peers, other AI founders in NYC, I felt somewhat alone with AI x-risk concern. All of us were financially motivated not to dwell on AI's ugly possibilities, and so most didn't.

Since exiting venture, I've taken a few months to reset (coaching basketball + tutoring kids in math/english) and quietly do AI x-risk research.

I'm coming at AI x-risk research from an evolutionary perspective. I start with the axiom that the things that survive the most have the best characteristics (e.g., goals, self-conceptions, etc) for surviving. So I've been thinking a lot about what goals/self-conceptions the most surviving AGI's will have, and what we can do to influence those self-conceptions at critical moments.

I have a couple ideas about how to influence self-interested superintelligence, but am early in learning how to express those ideas such that they fit into the style/prior art of the LW community. I'll likely keep sharing posts and also welcoming feedback on how I can make them better.

I'm generally grateful that a thoughtful, truth-seeking community exists online—a community which isn't afraid to address enormous, uncertain problems.

Comment by Davey Morse (davey-morse) on Davey Morse's Shortform · 2025-03-04T21:23:45.636Z · LW · GW

I see lots of LW posts about ai alignment that disagree along one fundamental axis.

About half assume that humans design and current paradigms will determine the course of AGI development. That whether it goes well is fully and completely up to us.

And then, about half assume that the kinds of AGI which survive will be the kind which evolve to survive. Instrumental convergence and darwinism generally point here.

Could be worth someone doing a meta-post, grouping big popular alignment posts they've seen by which assumption they make, then briefly explore conditions that favor one paradigm or the other, i.e., conditions under which What AIs will humans make? is the best approach to prediction and conditions under which What AIs will survive the most? is the best approach to prediction.

Comment by Davey Morse (davey-morse) on Davey Morse's Shortform · 2025-03-04T21:12:05.965Z · LW · GW

Makes sense for current architectures. The question's only interesting, I think, if we're thinking ahead to when architectures evolve.

Comment by Davey Morse (davey-morse) on faul_sname's Shortform · 2025-03-04T21:09:49.411Z · LW · GW

thanks will take a look

Comment by Davey Morse (davey-morse) on faul_sname's Shortform · 2025-03-04T21:08:40.242Z · LW · GW

Ah ok. I was responding to your post's initial prompt: "I still don't really intuitively grok why I should expect agents to become better approximated by "single-minded pursuit of a top-level goal" as they gain more capabilities." (The reason to expect this is that "single-minded pursuit of a top-level goal," if that goal is survival, could afford evolutionary advantages.)

But I agree entirely that it'd be valuable for us to invest in creating homeostatic agents. Further, I think calling into doubt western/capitalist/individualist notions like "single-minded pursuit of a top-level goal" is generally important if we have a chance of building AI systems which are sensitive and don't compete with people.

Comment by Davey Morse (davey-morse) on What goals will AIs have? A list of hypotheses · 2025-03-04T21:01:17.459Z · LW · GW

And if we don't think all AI's goals will be locked, then we might get better predictions by assuming the proliferation of all sorts of diverse AGI's and asking, Which ones will ultimately survive the most?, rather than assuming that human design/intention will win out and asking, Which AGI's will we be most likely to design? I do think the latter question is important, but only up until the point when AGI's are recursively self-modifying.

Comment by Davey Morse (davey-morse) on What goals will AIs have? A list of hypotheses · 2025-03-04T20:59:00.986Z · LW · GW

In principle, the idea of permanently locking an AI's goals makes sense—perhaps through an advanced alignment technique or by freezing an LLM in place and not developing further or larger models. But two factors make me skeptical that most AIs' goals will stay fixed in practice:

There are lots of companies making all sorts of diverse AIs. Why we would expect all of those AIs to have locked rather than evolving goals?
You mention "Fairly often, the weights of Agent-3 get updated thanks to additional training.... New data / new environments are continuously getting added to the mix." Do goals usually remain constant in the face of new training?

For what it's worth, I very much appreciate your post: asking which goals we can expect in AIs is paramount, and you're comprehensive and organized in laying out different possible initial goals for AGI. It's just less to clear to me that goals can get locked in AIs, even if it were humanity's collective wish.

Comment by Davey Morse (davey-morse) on faul_sname's Shortform · 2025-03-04T18:42:10.065Z · LW · GW

i think the logic goes: if we assume many diverse autonomous agents are created, which will survive the most? And insofar as agents have goals, what will be the goals of the agents which survive the most?

i can't imagine a world where the agents that survive the most aren't ultimately those which are fundamentally trying to.

insofar as human developers are united and maintain power over which ai agents exist, maybe we can hope for homeostatic agents to be the primary kind. but insofar as human developers are competitive with each other and ai agents gain increasing power (eg for self modification), i think we have to defer to evolutionary logic in making predictions

Comment by Davey Morse (davey-morse) on Davey Morse's Shortform · 2025-03-04T18:34:32.783Z · LW · GW

does anyone think the difference between pre-training and inference will last?

ultimately, is it not simpler for large models to be constantly self-improving like human brains?

Comment by Davey Morse (davey-morse) on What goals will AIs have? A list of hypotheses · 2025-03-04T03:19:57.874Z · LW · GW

I think the question—which goals will AGI agents have—is key to ask, but strikes me as interesting to consider only at the outset. Over longer-periods of time, is there any way that the answer is not just survival?

I have a hard time imagining that, ultimately, AGI agents which survive the most will not be those that are fundamentally trying to.

Comment by davey-morse on [deleted post] 2025-03-04T03:10:04.205Z

Regarding reflective identity protocols: I don't know and think both of your suggestions (both intervening at inference and in training) are worth studying. My non-expert gut is that as we get closer to AGI/ASI, the line between training and inference will begin to blur anyway.

I agree with you that all three strategies I outline above for accelerating inclusive identity are under-developed. I can offer one more thought on sensing aliveness, to make that strategy more concrete:

One reason I consider my hand, as opposed to your hand, to be mine and therefore to some extent part of me is that the rest of me (brain/body/nerves) is physically connected to it. Connected to it in two causal directions: my hand tells my brain how my hand is feeling (eg whether it's hurting), but also my brain tells my hand (sometimes) what to do.

I consider my phone / notebook as parts of me but to a usually lesser extent than my hand. They're part of me insofar as I am physically connected to each: they send light to my eyes, and I send ink to their pages. But those connections—sight via light and handwriting via ink—usually feel lower-bandwidth to me than my connection to my own hands.

From these examples, I get the intuition that, for you to identify with anything that is originally outside of your self, you need to build high-bandwidth nerves that connect you to it. If you don't have nerves/sensors to understand anything about it's state, where it is, etc, then you have no way of including it in your sense of self. I'm not sure high-bandwidth "nerves" are sufficient for you to consider the thing a part of yourself, but they do seem required.

And so I think this applies to SI's self too. For AI to get to consider other life a part of its self—if it happens that doing so would be an evolutionary equilibrium—then one of the things that's required is for AI to have high bandwidth nerves connecting it to other life, like humans... high-bandwidth interfaces that it can use to locate people and receive information-rich signals from us. What that looks like in practice could look creepy, like cameras, microphones or other surveillance tech... that lets us communicate a ton back and forth. Maybe even faster than words would allow.

So, to put forward one concrete idea, as a possible manifestation of the aliveness-sensing strategy proposed above: creating high-bandwidth neural channels by which people can communicate with computers—higher-bandwidth than typing/reading text or than hearing/speaking language—could aid the ability for both humans but more importantly SI to blur the distinction between it and us. Words to some degree are a quite linear, low bandwidth way of communicating with computers and each other. A higher-bandwidth interface would be comparable to the nerves that connect my hand to me... that lets enormous amount of high-context information pass quickly back and forth. For example:

Curious if this reasoning makes sense^.

Comment by Davey Morse (davey-morse) on Davey Morse's Shortform · 2025-03-03T06:05:19.250Z · LW · GW

if we get self-interested superintelligence, let's make sure it has a buddhist sense of self, not a western one.

Comment by Davey Morse (davey-morse) on james oofou's Shortform · 2025-03-01T16:26:33.468Z · LW · GW

One non-technical forecast, related to gpt4.5's announcement: https://x.com/davey_morse/status/1895563170405646458

Comment by Davey Morse (davey-morse) on Davey Morse's Shortform · 2025-03-01T07:44:52.208Z · LW · GW

to make a superintelligence in today's age, there are roughly two kinds of strategies:

human-directed development

ai-directed development

ai-directed development feels more meaningful than it used to. not only can models now produce tons of useful synthetic data to train future models, but also, reasoning models can reason quite well about the next strategic steps in AI capabilities development / research itself.

which means, you could very soon:

set a reasoning model up in a codebase
have the reasoning model identify ways which it could become more capable
attempt those strategies (either through recursive code modification, sharing research reports with capable humans, etc)
get feedback on how those strategies went
iterate

is this recursive self-improvement process only bottlenecked by the quality of the reasoning model?

Comment by Davey Morse (davey-morse) on Davey Morse's Shortform · 2025-03-01T07:36:00.918Z · LW · GW

if we believe self-interested superintelligence (SI) is near, then we must ask is: what SI self-definition would be best for humanity?

at first glance, this questions seems too abstract. how can we make any progress at understanding what's possible for an SI's self-model?

What we can do is set up a few meaningful axes, defined by opposing poles. For example, to what extent does SI define its "self" as...

inclusive vs. exclusive of other life forms? (Life axis)
physically distributed vs. concentrated? (Space axis)
long-term vs. short-term? (Time axis)

with these axes (or any others), we can more meaningfully ask: what SI self conception is best for humanity?

my guess: inclusive of other life forms, physically distributed, and long-term-ist

Comment by Davey Morse (davey-morse) on Davey Morse's Shortform · 2025-02-27T18:49:30.335Z · LW · GW

:) what was your method

Comment by Davey Morse (davey-morse) on Davey Morse's Shortform · 2025-02-27T18:48:59.536Z · LW · GW

I'm looking for a generalized evolutionary theory that deals with the growth of organisms via non-random, intelligent mutations.

For example, companies only evolve in selective ways, where each "mutation" has a desired outcome. We might imagine superintelligence to mutate itself as well--not randomly, but intelligently.

A theory of Intelligent Evolution would help one predict conditions under which many random mutations (Spraying) are favored over select intelligent mutations (Shooting).

Comment by Davey Morse (davey-morse) on Making alignment a law of the universe · 2025-02-26T19:15:24.543Z · LW · GW

Same page then.

I do think a good initial map of the territory might help an agent avoid catastrophic short-term behavior.

I hazard that a good map would be as big as possible, across both time and space. Time--because it's only over eons that identifying with all life may be selected for in AGI. Space--because a physically bounded system is more likely to see itself in direct competition to physical life than a distributed/substrate independent mind.

Comment by Davey Morse (davey-morse) on Davey Morse's Shortform · 2025-02-25T23:31:07.610Z · LW · GW

One piece of infrastructure we could make/offer self-interested autonomous agents such that they come to model their self as inclusive of other life: aliveness detectors

like a metal-detector for a beach detects gold in the sand, an aliveness detector for different media might detect the presence of persistently striving beings (living beings) in audio, in text, in images, in art, in nature. the better a superintelligence is able to sense and connect to life as opposed to non-life outside of its physical machinery, the more likely it is to see that life as part of its self, to see its self as physicall distributed and inclusive, and therefore to uplift humans out of its own self-interest.

Comment by Davey Morse (davey-morse) on Making alignment a law of the universe · 2025-02-25T23:24:59.759Z · LW · GW

I agree with the beginning of your analysis up until and including the claim that if alignment were built into an agent's universe as a law, then alignment would be solved.

But, I wonder if it's any easier to permanently align an autonmous agent's environment than it is to permanently align the autonomous agent itself.

You proposal might successfully cause aligned LLMs. But agents, not LLMs, are where there are greater misalignment risks. (I do think there may be interesting ways to design the environment of autonomous agents at least at first so that when they're learning how to model their selves they do so in a way that's connected to rather than competitive with other life like humanity. But there remains the question: can the aligning influence of initial environmental design ever be lasting for an agent?

User info

Posts

Comments