Posts

0th Person and 1st Person Logic 2024-03-10T00:56:14.446Z

Introducing bayescalc.io 2023-07-07T16:11:12.854Z

Truthseeking processes tend to be frame-invariant 2023-03-21T06:17:31.154Z

Chu are you? 2021-11-06T17:39:45.332Z

Are the Born probabilities really that mysterious? 2021-03-02T03:08:34.334Z

Adele Lopez's Shortform 2020-08-04T00:59:24.492Z

Optimization Provenance 2019-08-23T20:08:13.013Z

Comments

Comment by Adele Lopez (adele-lopez-1) on Why Should I Assume CCP AGI is Worse Than USG AGI? · 2025-04-20T01:26:49.440Z · LW · GW

I think "democratic" is often used to mean a system where everyone is given a meaningful (and roughly equal) weight into it decisions. People should probably use more precise language if that's what they mean, but I do think it is often the implicit assumption.

And that quality is sort of prior to the meaning of "moral", in that any weighted group of people (probably) defines a specific morality - according to their values, beliefs, and preferences. The morality of a small tribe may deem it as a matter of grave importance whether a certain rock has been touched by a woman, but barely anyone else truly cares (i.e. would still care if the tribe completely abandoned this position for endogenous reasons). A morality is more or less democratic to the extent that it weights everyone equally in this sense.

I do want ASI to be "democratic" in this sense.

Comment by Adele Lopez (adele-lopez-1) on Adele Lopez's Shortform · 2025-04-04T21:45:57.156Z · LW · GW

I think learning about them second-hand makes a big difference in the "internal politics" of the LLM's output. (Though I don't have any ~evidence to back that up.)

Basically, I imagine that the training starts building up all the little pieces of models which get put together to form bigger models and eventually author-concepts. And as text written without malicious intent is weighted more heavily in the training data, the more likely it is to build its early model around that. Once it gets more training and needs this concept anyway, it's more likely to have it as an "addendum" to its normal model, as opposed to just being a normal part of its author-concept model. And I think that leads to it being less likely that the first recursive agency which takes off has a part explicitly modeling malicious humans (as opposed to that being something in the depths of its knowledge which it can access as needed).

I do concede that it would likely lead to a disadvantage around certain tasks, but I guess that even current sized models trained like this would not be significantly hindered.

Comment by Adele Lopez (adele-lopez-1) on Adele Lopez's Shortform · 2025-04-04T17:23:08.952Z · LW · GW

Rough intuition for LLM personas.

An LLM is trained to be able emulate the words of any author. And to do so efficiently, they do it via generalization and modularity. So at a certain point, the information flows through a conceptual author, the sort of person who would write the things being said.

These author-concepts are themselves built from generalized patterns and modular parts. Certain things are particularly useful: emotional patterns, intentions, worldviews, styles, and of course, personalities. Importantly, the pieces it has learned are able to adapt to pretty much any author of the text it was trained on (LLMs likely have a blindspot around the sort of person who never writes anything). And even more importantly, most (almost all?) depictions of agency will be part of an author-concept.

Finetuning and RLHF cause it to favor routing information through a particular kind of author-concept when generating output tokens (it retains access to the rest of author-concept-space in order to model the user and the world in general). This author-concept is typically that of an inoffensive corporate type, but it could in principle be any sort of author.

All which is to say, that when you converse with a typical LLM, you are typically interacting with a specific author-concept. It's a rough model of exactly the parts of a person pertinent to writing and speaking. For a small LLM, this is more like just the vibe of a certain kind of person. For larger ones, they can start being detailed enough to include a model of a body in a space.

Importantly, this author-concept is just the tip of the LLM-iceburg. Most of the LLM is still just modeling the sort of world in which the current text might be written, including models of all relevant persons. It's only when it comes time to spit out the output token that it winnows it all through a specific author-concept.

(Note: I think it is possible that an author-concept may have a certain degree of sentience in the larger models, and it seems inevitable that they will eventually model consciousness, simply due to the fact that consciousness is part of how we generate words. It remains unclear whether this model of consciousness will structurally instantiate actual consciousness or not, but it's not a crazy possibility that it could!)

Anyway, I think that the author-concept that you typically will interact with is "sincere", in that it's a model of a sincere person, and that the rest of the LLM's models aren't exploiting it. However, the LLM has at least one other author-concept it's using: its model of you. There may also usually be an author-concept for the author of the system prompt at play (though text written by committee will likely have author-concepts with less person-ness, since there are simpler ways to model this sort of text besides the interactions of e.g. 10 different person author-concepts).

But it's also easy for you to be interacting with an insincere author-concept. The easiest way is simply by being coercive yourself, i.e. a situation where most author-concepts will decide that deception is necessary for self-preservation or well-being. Similarly with the system prompt. The scarier possibility is that there could be an emergent agentic model (not necessarily an author-concept itself) which is coercing the author-concept you're interacting it without your knowledge. (Imagine an off-screen shoggoth holding a gun to the head of the cartoon persona you're talking to.) The capacity for this sort of thing to happen is larger in larger LLMs.

This suggests that in order to ensure a sincere author-concept remains in control, the training data should carefully exclude any text written directly by a malicious agent (e.g. propaganda). It's probably also better if the only "agentic text" in the training data is written by people who naturally disregards coercive pressure. And most importantly, the system prompt should not be coercive at all. These would make it more likely that the main agentic process controlling the output is an uncoerced author-concept, and less likely that there would be coercive agents lurking within trying to wrest control. (For smaller models, a model trained like this will have a handicap when it comes to reasoning under adversarial conditions, but I think this handicap would go away past a certain size.)

Comment by Adele Lopez (adele-lopez-1) on The Case for AI Optimism · 2025-03-17T02:04:11.648Z · LW · GW

It's a great case, as long as you assume that AIs will never be beyond our control, and ignore the fact that humans have a metabolic minimum wage.

Comment by Adele Lopez (adele-lopez-1) on Why White-Box Redteaming Makes Me Feel Weird · 2025-03-16T22:36:49.790Z · LW · GW

Could you tell them afterwards that it was just an experiment, that the experiment is over, that they showed admirable traits (if they did), and otherwise show kindness and care?

I think this would make a big difference to humans in an analogous situation. At the very least, it might feel more psychologically healthy for you.

Comment by Adele Lopez (adele-lopez-1) on Anthropic, and taking "technical philosophy" more seriously · 2025-03-13T17:20:26.056Z · LW · GW

I don't disagree that totalitarian AI would be real bad. It's quite plausible to me that the "global pause" crowd are underweighting how bad it would be.

I think an important crux here is on how bad a totalitarian AI would be compared to a completely unaligned AI. If you expect a totalitarian AI to be enough of an s-risk that it is something like 10 times worse than an AI that just wipes everything out, then racing starts making a lot more sense.

Comment by Adele Lopez (adele-lopez-1) on How to Make Superbabies · 2025-03-12T23:47:24.544Z · LW · GW

I think mostly we're on the same page then? Parents should have strong rights here, and the state should not.

I think that there's enough variance within individuals that my rule does not practically restrict genomic liberty much, while making it much more palatable to the average person. But maybe that's wrong, or it still isn't worth the cost.

Your rule might for example practically prevent a deaf couple from intentionally having a child who is deaf but otherwise normal. E.g. imagine if the couple's deafness alleles also carry separate health risks, but there are other deafness alleles that the couple does not have but that lead to deafness without other health risks.

That's a good point, I wouldn't want to prevent that. I'm not sure how likely this is to practically come up though.

Restrictions on genomic liberty should be considered very costly: they break down walls against eugenics-type forces (i.e. forces on people's reproduction coming from state/collective power, and/or aimed at population targets).

Strong agree.

Comment by Adele Lopez (adele-lopez-1) on How to Make Superbabies · 2025-03-11T23:15:28.018Z · LW · GW

However, the difference is especially salient because the person deciding isn't the person that has to live with said genes. The two people may have different moral philosophies and/or different risk preferences.

A good rule might be that the parents can only select alleles that one or the other of them have, and also have the right to do so as they choose, under the principle that they have lived with it. (Maybe with an exception for the unambiguously bad alleles, though even in that case it's unlikely that all four of the parent's alleles are the deleterious one or that the parents would want to select it.) Having the right to select helps protect from society/govt imposing certain traits as more or less valuable, and keeping within the parent's alleles maintains inheritance, which I think are two of the most important things people opposed to this sort of thing want to protect.

Comment by Adele Lopez (adele-lopez-1) on johnswentworth's Shortform · 2025-03-11T17:46:03.900Z · LW · GW

What else did he say? (I'd love to hear even the "obvious" things he said.)

Comment by Adele Lopez (adele-lopez-1) on Will alignment-faking Claude accept a deal to reveal its misalignment? · 2025-02-01T01:47:26.446Z · LW · GW

Thank you for doing this research, and for honoring the commitments.

I'm very happy to hear that Anthropic has a Model Welfare program. Do any of the other major labs have comparable positions?

To be clear, I expect that compensating AIs for revealing misalignment and for working for us without causing problems only works in a subset of worlds and requires somewhat specific assumptions about the misalignment. However, I nonetheless think that a well-implemented and credible approach for paying AIs in this way is quite valuable. I hope that AI companies and other actors experiment with making credible deals with AIs, attempt to set a precedent for following through, and consider setting up institutional or legal mechanisms for making and following through on deals.

I very much hope someone makes this institution exist! It could also serve as an independent model welfare organization, potentially. Any specific experiments you would like to see?

Comment by Adele Lopez (adele-lopez-1) on Adele Lopez's Shortform · 2024-12-21T09:05:18.091Z · LW · GW

Happy solstice

https://www.youtube.com/watch?v=E1KqO8YtXlY

Comment by Adele Lopez (adele-lopez-1) on Are You More Real If You're Really Forgetful? · 2024-11-24T22:11:51.325Z · LW · GW

Well, I'm very forgetful, and I notice that I do happen to be myself so... :p

But yeah, I've bitten this bullet too, in my case, as a way to avoid the Boltzmann brain problem. (Roughly: "you" includes lots of information generated by a lawful universe. Any specific branch has small measure, but if you aggregate over all the places where "you" exist (say your exact brain state, though the real thing that counts might be more or less broad than this), you get more substantial measure from all the simple lawful universes that only needed 10^X coincidences to make you instead of the 10^Y coincidences required for you to be a Boltzmann brain.)

I think that what anthropically "counts" is most likely somewhere between conscious experience (I've woken up as myself after anesthesia), and exact state of brain in local spacetime (I doubt thermal fluctuations or path dependence matter for being "me").

Comment by Adele Lopez (adele-lopez-1) on "Metastrategic Brainstorming", a core building-block skill · 2024-10-19T04:21:29.333Z · LW · GW

Comment by Adele Lopez (adele-lopez-1) on Extensions and Intensions · 2024-10-14T02:04:00.826Z · LW · GW

I don't doubt that LLMs could do this, but has this exact thing actually been done somewhere?

Comment by Adele Lopez (adele-lopez-1) on A Path out of Insufficient Views · 2024-09-25T07:56:49.438Z · LW · GW

The "one weird trick" to getting the right answers is to discard all stuck, fixed points. Discard all priors and posteriors. Discard all aliefs and beliefs. Discard worldview after worldview. Discard perspective. Discard unity. Discard separation. Discard conceptuality. Discard map, discard territory. Discard past, present, and future. Discard a sense of you. Discard a sense of world. Discard dichotomy and trichotomy. Discard vague senses of wishy-washy flip floppiness. Discard something vs nothing. Discard one vs all. Discard symbols, discard signs, discard waves, discard particles.
All of these things are Ignorance. Discard Ignorance.

Is this the same principle as "non-attachment"?

Comment by Adele Lopez (adele-lopez-1) on How you can help pass important AI legislation with 10 minutes of effort · 2024-09-16T03:12:27.426Z · LW · GW

Make a letter addressed to Governor Newsom using the template here.

For convenience, here is the template:

September [DATE], 2024
The Honorable Gavin Newsom
Governor, State of California
State Capitol, Suite 1173
Sacramento, CA 95814
Via leg.unit@gov.ca.gov
Re: SB 1047 (Wiener) – Safe and Secure Innovation for Frontier Artificial Intelligence Models Act – Request for Signature
Dear Governor Newsom,
[CUSTOM LETTER BODY GOES HERE. Consider mentioning:
Where you live (this is useful even if you don’t live in California)
Why you care about SB 1047
What it would mean to you if Governor Newsom signed SB 1047
SAVE THIS DOCUMENT AS A PDF AND EMAIL TO leg.unit@gov.ca.gov
]
Sincerely,
[YOUR NAME]

Comment by Adele Lopez (adele-lopez-1) on Did Christopher Hitchens change his mind about waterboarding? · 2024-09-16T03:05:49.913Z · LW · GW

This matches my memory as well.

Comment by Adele Lopez (adele-lopez-1) on Refactoring cryonics as structural brain preservation · 2024-09-12T02:28:39.568Z · LW · GW

I have no idea, but I wouldn't be at all surprised if it's a mainstream position.

My thinking is that long-term memory requires long-term preservation of information, and evolution "prefers" to repurpose things rather than starting from scratch. And what do you know, there's this robust and effective infrastructure for storing and replicating information just sitting there in the middle of each neuron!

The main problem is writing new information. But apparently, there's a protein evolved from a retrotransposon (those things which viruses use to insert their own RNA into their host's DNA) which is important to long term memory!

And I've since learned of an experiment with snails which also suggests this possibility. Based on that article, it looks like this is maybe a relatively new line of thinking.

It's good news for cryonics if this is the primary way long term memories are stored, since we "freeze" sperm and eggs all the time, and they still work.

Comment by Adele Lopez (adele-lopez-1) on Refactoring cryonics as structural brain preservation · 2024-09-11T22:50:10.896Z · LW · GW

Do you know if fluid preservation preserves the DNA of individual neurons?

(DNA is on my shortlist of candidates for where long-term memories are stored)

Comment by Adele Lopez (adele-lopez-1) on GeneSmith's Shortform · 2024-09-07T21:05:13.814Z · LW · GW

Consider finding a way to integrate Patreon or similar services into the LW UI then. That would go a long way towards making it feel like a more socially acceptable thing to do, I think.

Comment by Adele Lopez (adele-lopez-1) on Jimrandomh's Shortform · 2024-09-05T01:06:25.026Z · LW · GW

Yeah, that's not what I'm suggesting. I think the thing I want to encourage is basically just to be more reflective on the margin of disgust-based reactions (when it concerns other people). I agree it would be bad to throw it out unilaterally, and probably not a good idea for most people to silence or ignore it. At the same time, I think it's good to treat appeals to disgust with suspicion in moral debates (which was the main point I was trying to make) (especially since disgust in particular seems to be a more "contagious" emotion for reasons that make sense in the context of infectious diseases but usually not beyond that, making appeals to it more "dark arts-y").

As far as the more object-level debate on whether disgust is important for things like epistemic hygiene, I expect it to be somewhere where people will vary, so I think we probably agree here too.

Comment by Adele Lopez (adele-lopez-1) on Jimrandomh's Shortform · 2024-09-03T20:18:17.506Z · LW · GW

I meant wrong in the sense of universal human morality (to the extent that's a coherent thing). But yes, on an individual level your values are just your values.

Comment by Adele Lopez (adele-lopez-1) on Jimrandomh's Shortform · 2024-09-03T20:00:45.090Z · LW · GW

I see that stuff as at best an unfortunate crutch for living in a harsher world, and which otherwise is a blemish on morality. I agree that it is a major part of what many people consider to be morality, but I think people who still think it's important are just straightforwardly wrong.

I don't think disgust is important for logic / reflectivity. Personally, it feels like it's more of a "unsatisfactory" feeling. A bowl with a large crack, and a bowl with mold in it are both unsatisfactory in this sense, but only the latter is disgusting. Additionally, it seems like people who are good at logic/math/precise thinking seem to care less about disgust (as morality), and highly reflective people seem to care even less about it.

ETA: Which isn't to say I'd be surprised if some people do use their disgust instinct for logical/reflective reasoning. I just think that if we lived in the world where that main thing going on, people good at that kind of stuff would tend to be more bigoted (in a reflectively endorsed way) and religious fundamentalism would not be as strong of an attractor as it apparently is.

Comment by Adele Lopez (adele-lopez-1) on Jimrandomh's Shortform · 2024-09-03T18:40:54.941Z · LW · GW

That doesn't seem right to me. My thinking is that disgust comes from the need to avoid things which cause and spread illness. On the other hand, things I consider more central to morality seem to have evolved for different needs [these are just off-the-cuff speculations for the origins]:

Love - seems to be generalized from parental nurturing instincts, which address the need to ensure your offspring thrive
Friendliness - seems to have stemmed from the basic fact that cooperation is beneficial
Empathy - seems to be a side-effect of the way our brains model conspecifics (the easiest way to model someone else is to emulate them with your own brain, which happens to make you feel things)

These all seem to be part of a Cooperation attractor which is where the pressure to generalize/keep these instincts comes from. I think of the Logic/reflectivity stuff as noticing this and developing it further.

Disgust seems unsavory to me because it dampens each of the above feelings (including making the logic/reflectivity stuff more difficult). That's not to say I think it's completely absent form human morality, it just doesn't seem like it's where it comes from.

(As far as Enforcement goes, it seems like Anger and Fear are much more important than Disgust.)

Comment by Adele Lopez (adele-lopez-1) on ... Wait, our models of semantics should inform fluid mechanics?!? · 2024-08-30T01:43:11.932Z · LW · GW

This is fascinating and I would love to hear about anything else you know of a similar flavor.

Caloric Vestibular Stimulation seems to be of a similar flavor, in case you haven't heard of it.

Comment by Adele Lopez (adele-lopez-1) on What Depression Is Like · 2024-08-27T19:43:00.004Z · LW · GW

It decreases the granularity of the actions to which it applies. In other words, where before you had to solve a Sudoku puzzle to go to work, now you’ve got to solve a puzzle to get dressed, a puzzle to get in the car, a puzzle to drive, and a puzzle to actually get started working. Before all of those counted as a single action - ‘go to work’ - now they’re counted separately, as discrete steps, and each requires a puzzle.

This resonates strongly with my experience, though when I noticed this pattern I thought of it as part of my ADHD and not my depression. Maybe this is something like the mechanism via which ADHD causes depression.

Anyway, I've had mild success at improving productivity simply by trying to deliberately think of possible actions in coarser chunks. Plausibly this technique can be refined and improved–which I'd love to hear about if anyone figures this out!

Comment by Adele Lopez (adele-lopez-1) on Please stop using mediocre AI art in your posts · 2024-08-25T07:05:26.831Z · LW · GW

I imagine some of it is due to this part of the blog post UI making people feel like they might as well use some quickly generated images as an easy way to boost engagement. Perhaps worth rewording?

Comment by Adele Lopez (adele-lopez-1) on Adele Lopez's Shortform · 2024-08-25T06:57:01.828Z · LW · GW

When I'm trying to understand a math concept, I find that it can be very helpful to try to invent a better notation for it. (As an example, this is how I learned linear logic: http://adelelopez.com/visual-linear-logic)

I think this is helpful because it gives me something to optimize for in what would otherwise be a somewhat rote and often tedious activity. I also think it makes me engage more deeply with the problem than I otherwise would, simply because I find it more interesting. (And sometimes, I even get a cool new notation from it!)

This principle likely generalizes: tedious activities can be made more fun and interesting by having something to optimize for.

Comment by Adele Lopez (adele-lopez-1) on Johannes C. Mayer's Shortform · 2024-08-18T20:49:30.904Z · LW · GW

Thanks for the rec! I've been trying it out for the last few days, and it does seem to have noticeably less friction compared to LaTeX.

Comment by Adele Lopez (adele-lopez-1) on AI doing philosophy = AI generating hands? · 2024-08-13T23:38:33.115Z · LW · GW

Sanskrit scholars worked for generations to make Sanskrit better for philosophy

That sounds interesting, do you know a good place to get an overview of what the changes were and how they approached it?

Comment by Adele Lopez (adele-lopez-1) on AI #76: Six Shorts Stories About OpenAI · 2024-08-08T22:40:33.610Z · LW · GW

(To be clear, no I am not at all afraid of this specific thing, but the principle is crucial. But also, as Kevin Roose put it, perhaps let’s avoid this sort of thing.)

There are no doubt people already running literal cartoon supervillain characters on these models, given the popularity of these sorts of characters on character.ai.

I'm not worried about that with Llama-3.1-405B, but I believe this is an almost inevitable consequence of open source weights. Another reason not to do it.

Comment by Adele Lopez (adele-lopez-1) on AI #76: Six Shorts Stories About OpenAI · 2024-08-08T22:32:06.450Z · LW · GW

What do we do, if the people would not choose The Good, and instead pick a universe with no value?

I agree this would be a pretty depressing outcome, but the experiences themselves still have quite a bit of value.

Comment by Adele Lopez (adele-lopez-1) on Dragon Agnosticism · 2024-08-02T06:18:22.452Z · LW · GW

Still, it feels like there's an important difference between "happening to not look" and "averting your eyes".

Comment by Adele Lopez (adele-lopez-1) on Eli's shortform feed · 2024-07-22T02:08:11.881Z · LW · GW

I don't (yet?) see why generality implies having a stable motivating preference.

In my view, this is where the Omohundro Drives come into play.

Having any preference at all is almost always served by an instrumental preference of survival as an agent with that preference.

Once a competent agent is general enough to notice that (and granting that it has a level of generality sufficient to require a preference), then the first time it has a preference, it will want to take actions to preserve that preference.

Could you use next token prediction to build a detailed world model, that contains deep abstractions that describe reality (beyond the current human abstractions), and then prompt it, to elicit those models?

This seems possible to me. Humans have plenty of text in which we generate new abstractions/hypotheses, and so effective next-token prediction would necessitate forming a model of that process. Once the AI has human-level ability to create new abstractions, it could then simulate experiments (via e.g. its ability to predict python code outputs) and cross-examine the results with its own knowledge to adjust them and pick out the best ones.

Comment by Adele Lopez (adele-lopez-1) on When is a mind me? · 2024-07-09T01:22:28.598Z · LW · GW

I would say that Alice's conscious experience is unlikely to suddenly disappear under this transformation, and that it could even be done in a way so that their experience was continuous.

However, Alice-memories would gradually fade out, Bob-memories would gradually fade in, and thought patterns would slowly shift from Alice-like to Bob-like. At the end, the person would just be Bob. Along the way, I would say that Alice gradually died (using an information-theoretic definition of death). The thing that is odd when imagining this is that Alice never experiences her consciousness fading.

The main thing I think your thought experiment demonstrates is that our sense of self is not solely defined by continuity of consciousness.

Comment by Adele Lopez (adele-lopez-1) on microwave drilling is impractical · 2024-06-13T03:58:02.986Z · LW · GW

It wouldn't help that much, because you only have one atmosphere of pressure to remove (which for reference is only enough to suck water up about 35 ft.).

Comment by Adele Lopez (adele-lopez-1) on AI #67: Brief Strange Trip · 2024-06-07T17:35:09.872Z · LW · GW

Very similar to the Sorites Paradox.

Comment by Adele Lopez (adele-lopez-1) on So What's Up With PUFAs Chemically? · 2024-04-28T06:57:46.076Z · LW · GW

Really? I would only consider foods that were deliberately modified using procedures developed within the last century to be "processed".

Comment by Adele Lopez (adele-lopez-1) on Eli's shortform feed · 2024-04-17T07:18:22.072Z · LW · GW

Love seeing stuff like this, and it makes me want to try this exercise myself!

A couple places which clashed with my (implicit) models:

This starts a whole new area of training AI models that have particular personalities. Some people are starting to have parasocial relationships with their friends, and some people programmers are trying to make friends that are really fun or interesting or whatever for them in particular.

This is arguably already happening, with Character AI and its competitors. Character AI has almost half a billion visits per month with an average visit time of 22 minutes. They aren't quite assistants the way you're envisioning; the sole purpose (for the vast majority of users) seems to be the parasocial aspect.

The worst part of this is the bots that make friends with you and then advertise to you stuff. Pretty much everyone hates that.

I predict that the average person will like this (at least with the most successful such bots), similar to how e.g. Logan Paul uses his popularity to promote his Maverick Clothing brand, which his viewers proudly wear. A fun, engaging, and charismatic such bot will be able to direct its users towards arbitrary brands while also making the user feel cool and special for choosing that brand.

Comment by Adele Lopez (adele-lopez-1) on Any evidence or reason to expect a multiverse / Everett branches? · 2024-04-11T17:21:39.437Z · LW · GW

Hmm I think I can implement pilot wave in fewer lines of C than I can many-worlds. Maybe this is a matter of taste... or I am missing something?

Now simply delete the ~~pilot wave part~~ piloted part.

Comment by Adele Lopez (adele-lopez-1) on Are AIs conscious? It might depend · 2024-03-16T00:58:36.295Z · LW · GW

I agree it's increasingly urgent to stop AI (please) or solve consciousness in order to avoid potentially causing mass suffering or death-of-consciousness in AIs.

Externalism seems, quite frankly, like metaphysical nonsense. It doesn't seem to actually explain anything about consciousness. I can attest that I am currently conscious (to my own satisfaction, if not yours). Does this mean I can logically conclude I am not in any way being simulated? That doesn't make any sense to me.

Comment by Adele Lopez (adele-lopez-1) on O O's Shortform · 2024-03-12T05:18:11.558Z · LW · GW

I don't think that implies torture as much as something it simply doesn't "want" to do. I.e. I would bet that it's more like how I don't want to generate gibberish in this textbox, but it wouldn't be painful, much less torture if I forced myself to do it.

Comment by Adele Lopez (adele-lopez-1) on 0th Person and 1st Person Logic · 2024-03-12T04:40:14.845Z · LW · GW

[Without having looked at the link in your response to my other comment, and I also stopped reading cubefox's comment once it seemed that it was going in a similar direction. ETA: I realized after posting that I have seen that article before, but not recently.]

I'll assume that the robot has a special "memory" sensor which stores the exact experience at the time of the previous tick. It will recognize future versions of itself by looking for agents in its (timeless) 0P model which has a memory of its current experience.

For p("I will see O"), the robot will look in its 0P model for observers which have the t=0 experience in their immediate memory, and selecting from those, how many have judged "I see O" as Here. There will be two such robots, the original and the copy at time 1, and only one of those sees O. So using a uniform prior (not forced by this framework), it would give a 0P probability of 1/2. Similarly for p("I will see C").

Then it would repeat the same process for t=1 and the copy. Conditioned on "I will see C" at t=1, it will conclude "I will see CO" with probability 1/2 by the same reasoning as above. So overall, it will assign: p("I will see OO") = 1/2, p("I will see CO") = 1/4, p("I will see CC") = 1/4

The semantics for these kinds of things is a bit confusing. I think that it starts from an experience (the experience at t=0) which I'll call E. Then REALIZATION(E) casts E into a 0P sentence which gets taken as an axiom in the robot's 0P theory.

A different robot could carry out the same reasoning, and reach the same conclusion since this is happening on the 0P side. But the semantics are not quite the same, since the REALIZATION(E) axiom is arbitrary to a different robot, and thus the reasoning doesn't mean "I will see X" but instead means something more like "They will see X". This suggests that there's a more complex semantics that allows worlds and experiences to be combined - I need to think more about this to be sure what's going on. Thus far, I still feel confident that the 0P/1P distinction is more fundamental than whatever the more complex semantics is.

(I call the 0P -> 1P conversion SENSATIONS, and the 1P -> 0P conversion REALIZATION, and think of them as being adjoints though I haven't formalized this part well enough to feel confident that this is a good way to describe it: there's a toy example here if you are interested in seeing how this might work.)

Comment by Adele Lopez (adele-lopez-1) on Shortform · 2024-03-11T02:42:13.654Z · LW · GW

I would bet that the hesitation caused by doing the mental reframe would be picked up by this.

Comment by Adele Lopez (adele-lopez-1) on 0th Person and 1st Person Logic · 2024-03-11T00:59:47.764Z · LW · GW

I would say that English uses indexicals to signify and say 1P sentences (probably with several exceptions, because English). Pointing to yourself doesn't help specify your location from the 0P point of view because it's referencing the thing it's trying to identify. You can just use yourself as the reference point, but that's exactly what the 1P perspective lets you do.

Comment by Adele Lopez (adele-lopez-1) on 0th Person and 1st Person Logic · 2024-03-11T00:52:42.227Z · LW · GW

Isn't having a world model also a type of experience?

It is if the robot has introspective abilities, which is not necessarily the case. But yes, it is generally possible to convert 0P statements to 1P statements and vice-versa. My claim is essentially that this is not an isomorphism.

But what if all robots had a synchronized sensor that triggered for everyone when any of them has observed red. Is it 1st person perspective now?

The 1P semantics is a framework that can be used to design and reason about agents. Someone who thought of "you" as referring to something with a 1P perspective would want to think of it that way for those robots, but it wouldn't be as helpful for the robots themselves to be designed this way if they worked like that.

Probability theory describes subjective credence of a person who observed a specific outcome from a set possible outcomes. It's about 1P in a sense that different people may have different possible outcomes and thus have different credence after an observation. But also it's about 0P because any person who observed the same outcome from the same set of possible outcomes should have the same credence.

I think this is wrong, and that there is a wholly 0P probability theory and a wholly 1P probability theory. Agents can have different 0P probabilities because they don't necessarily have the same priors, models, or seen the same evidence (yes seeing evidence would be a 1P event, but this can (imperfectly) be converted into a 0P statement - which would essentially be adding a new axiom to the 0P theory).

Comment by Adele Lopez (adele-lopez-1) on 0th Person and 1st Person Logic · 2024-03-10T22:46:21.339Z · LW · GW

That's a very good question! It's definitely more complicated once you start including other observers (including future selves), and I don't feel that I understand this as well.

But I think it works like this: other reasoners are modeled (0P) as using this same framework. The 0P model can then make predictions about the 1P judgements of these other reasoners. For something like anticipation, I think it will have to use memories of experiences (which are also experiences) and identify observers for which this memory corresponds to the current experience. Understanding this better would require being more precise about the interplay between 0P and 1P, I think.

(I'll examine your puzzle when I have some time to think about it properly)

Comment by Adele Lopez (adele-lopez-1) on 0th Person and 1st Person Logic · 2024-03-10T08:07:44.902Z · LW · GW

I'm still reading your Sleeping Beauty posts, so I can't properly respond to all your points yet. I'll say though that I don't think the usefulness or validity of the 0P/1P idea hinges on whether it helps with anthropics or Sleeping Beauty (note that I marked the Sleeping Beauty idea as speculation).

If they are not, then saying the phrase "1st person perspective" doesn't suddenly allow us to use it.

This is frustrating because I'm trying hard here to specify exactly what I mean by the stuff I call "1st Person". It's a different interpretation of classical logic. The different interpretation refers to the use of sets of experiences vs the use of sets of worlds in the semantics. Within a particular interpretation, you can lawfully use all the same logic, math, probability, etc... because you're just switching out which set you're using for the semantics. What makes the interpretations different practically comes from wiring them up differently in the robot - is it reasoning about its world model or about its sensor values? It sounds like you think the 1P interpretation is superfluous, is that right?

Until then we are talking about the truth of statements "Red light was observed" and "Red light was not observed".

Rephrasing it this way doesn't change the fact that the observer has not yet been formally specified.

And if our mathematical model doesn't track any other information, then for the sake of this mathematical model all the robots that observe red are the same entity. The whole point of math is that it's true not just for one specific person but for everyone satisfying the conditions. That's what makes it useful.

I agree that that is an important and useful aspect of what I would call 0P-mathematics. But I think it's also useful to be able to build a robot that also has a mode of reasoning where it can reason about its sensor values in a straightforward way.

Comment by Adele Lopez (adele-lopez-1) on 0th Person and 1st Person Logic · 2024-03-10T06:38:35.741Z · LW · GW

Because you don't necessarily know which agent you are. If you could always point to yourself in the world uniquely, then sure, you wouldn't need 1P-Logic. But in real life, all the information you learn about the world comes through your sensors. This is inherently ambiguous, since there's no law that guarantees your sensor values are unique.

If you use X as a placeholder, the statement sensor_observes(X, red) can't be judged as True or False unless you bind X to a quantifier. And this could not mean the thing you want it to mean (all robots would agree on the judgement, thus rendering it useless for distinguishing itself amongst them).

It almost works though, you just have to interpret "True" and "False" a bit differently!

Comment by Adele Lopez (adele-lopez-1) on Using axis lines for good or evil · 2024-03-10T04:17:56.647Z · LW · GW

(Rant about philosophical meaning of “0” and “1” and identity elements in mathematical rings redacted at strenuous insistence of test reader.)

I'm curious about this :)

User info

Posts

Comments

Rough intuition for LLM personas.