sweenesm

Posts
Comments

Posts

Proposal for a Form of Conditional Supplemental Income (CSI) in a Post-Work World 2025-01-31T01:00:55.064Z

Creating a “Conscience Calculator” to Guard-Rail an AGI 2024-08-12T16:03:30.826Z

Lamini’s Targeted Hallucination Reduction May Be a Big Deal for Job Automation 2024-06-18T15:29:38.490Z

How to Give Coming AGI's the Best Chance of Figuring Out Ethics for Us 2024-05-23T19:44:42.386Z

sweenesm's Shortform 2024-04-26T11:42:15.846Z

Update on Developing an Ethics Calculator to Align an AGI to 2024-03-12T12:33:55.092Z

If you controlled the first agentic AGI, what would you set as its first task(s)? 2024-03-03T14:16:49.708Z

Thoughts for and against an ASI figuring out ethics for itself 2024-02-20T23:40:56.770Z

Proposal for an AI Safety Prize 2024-01-31T18:35:48.130Z

Questions I’d Want to Ask an AGI+ to Test Its Understanding of Ethics 2024-01-26T23:40:23.074Z

How to Promote More Productive Dialogue Outside of LessWrong 2024-01-15T14:16:59.971Z

Towards an Ethics Calculator for Use by an AGI 2023-12-12T18:37:47.407Z

Comments

Comment by sweenesm on The Risk of Gradual Disempowerment from AI · 2025-02-06T00:05:04.969Z · LW · GW

Nice post, thanks for sharing it. In terms of a plan for fighting human disempowerment that’s compatible with the way things seem to be going, i.e., assuming we don’t pause/stop AI development, I think we should:

Not release any AGI/AGI+ systems without hardware-level, tamper-proof artificial conscience guardrails on board, with these consciences geared towards promoting human responsibility as a heuristic for promoting well-being
Avoid having humans living on universal basic incomes (UBI) with little to no motivation to keep themselves from becoming enfeebled - a conditional supplemental income (CSI) might be one way to do this

Does #1 have potential risks and pitfalls, and is it going to be difficult to figure out and implement in time? Yes, but more people focusing more effort on it would help. And AI's that have conscience around disempowering humans seems like a good first step to help avoid disempowering humans.

#1 would also help against what I think is a more immediate threat: use of advanced AI’s by bad human actors to purposely or uncaringly cause destruction, such as in the pursuit of making money. Autonomous advanced defensive AI’s with artificial conscience guardrails could potentially limit collateral damage while preventing/defending against attacks. The speed of such attacks will likely be too great for humans to be in the loop on decisions made to defend against them.

Comment by sweenesm on Proposal for a Form of Conditional Supplemental Income (CSI) in a Post-Work World · 2025-02-03T23:57:52.182Z · LW · GW

Thanks for the comment! Perhaps I was more specific than needed, but I wanted to give people (and any AI's reading this) some concrete examples. I imagine AI's will someday be able to optimize this idea.

I would love it if our school system changed to include more emotional education, but I'm not optimistic they would do this well right now (due in part to educators not having experience with emotional education themselves). Hopefully AI's will help at some point.

Comment by sweenesm on OpenAI releases deep research agent · 2025-02-03T13:40:57.443Z · LW · GW

How o3-mini scores: https://x.com/DanHendrycks/status/1886213523900109011

10.5-13% on text only part of HLE (text only are 90% of the questions)

[corrected the above to read "o3-mini", thanks.]

Comment by sweenesm on How to Give Coming AGI's the Best Chance of Figuring Out Ethics for Us · 2025-02-02T22:36:40.001Z · LW · GW

Thanks for the comment. Timeframes "determined" by feel (they're guesses that seem reasonable).

Comment by sweenesm on [deleted post] 2025-01-30T18:58:42.453Z

Thanks for the post. It'd be helpful to have a TL;DR for this (an abstract), since it's kinda long - what are the main points you're trying to get across?

Comment by sweenesm on nikola's Shortform · 2025-01-27T02:14:51.363Z · LW · GW

Yes, this is point #1 from my recent Quick Take. Another interesting point is that there are no confidence intervals on the accuracy numbers - it looks like they only ran the questions once in each model, so we don't know how much random variation might account for the differences between accuracy numbers. [Note added 2-3-25: I'm not sure why it didn't make the paper, but Scale AI does report confidence intervals on their website.]

Comment by sweenesm on sweenesm's Shortform · 2025-01-24T17:26:38.048Z · LW · GW

Some Notes on Humanity’s Last Exam

While I congratulate CAIS and Scale AI for producing this benchmark, I have a couple of comments on things they may want to clear up (although these are ultimately a bit “in the weeds” to what the benchmark is really supposed to be concerned with, I believe):

DeepSeek-R1 and Gemini 2.0 Flash Thinking were released after the deadline for submitting questions eligible for prizes (though submissions remained open after this). Thus, these models weren’t used to screen most, if not all, questions. This means that the questions were preferentially screened to stump the other models, but not these, so it wouldn’t be too surprising if these models scored better than others.
After reading the paper, my impression is that these questions were run through the models each only one time (after the one time they were run through some of the models when originally submitted). If people want to get into the weeds and say that DeepSeek-R1 is actually better on this exam than OpenAI’s o1, it would be useful to run the questions through each model at least 6 times to establish some confidence intervals on the accuracy numbers. I suspect that this would show the differences between o1 and R1 are not statistically significant. It would be interesting to know the typical size of the confidence intervals, though, and whether these confidence interval sizes shift when “reasoning” is involved in the model or not. (It would also likely be useful if reporting on any and all benchmarks for AI’s required including confidence intervals so we could feel better that people weren’t gaming the system and just reporting their best results.) Running the questions on more models that weren’t originally used for question screening, such as Llama 3, could help establish even more of a baseline. [Note added 2-3-25: I'm not sure why it didn't make the paper, but Scale AI does report confidence intervals on their website.]
20% of questions are multiple-choice. If each multiple-choice question has 5 possible answers, then random guessing would yield 4% accuracy on the exam as a whole. It would be interesting to know what the actual average number of answers was for the multiple-choice questions, and thus the actual “random guessing accuracy.”

Also, it might be interesting to take some of the multiple-choice questions and rewrite them by randomly removing one of the answer choices and replacing it with “none of the above.” If the model chooses “none of the above," then see if it can come up with the right answer on its own, rather than from a list (if indeed the right answer isn’t present). Personally, I always found multiple-choice questions in which you weren’t sure if the right answer was there to be more difficult - when the right answer is there, sometimes you can take clues from it to figure out that it’s the right answer. Rewriting some questions in this way could make them a little more difficult without much added work by the exam preparers.

Finally, having models take the multiple-choice section of the exam numerous times with slight variations in the wording of the questions, without changing their meanings, could make this section of the exam a little more robust against “luck.”

Note: I submitted two materials science-related multiple-choice questions with 5 answer options each for consideration in Humanity’s Last Exam. For submitting questions (https://agi.safe.ai/submit), the process was to type your question in an input window, enter as many multiple choice answers as you wanted (I think the minimum was 5 and there might not have been a maximum), and then this question was run through various models (GTP-4o, Sonnet 3.5, Gemini Pro 1.5, o1) to see if they gave the correct answer. The paper says that the screening criterion was that “multiple-choice questions must stump all but one model to account for potential lucky guesses.” I think I didn’t submit my questions unless it could stump all the models.

In case you’re interested, you can find my one question that made the cut by searching for “sintering” in the dataset available at HuggingFace. For my one question that didn’t make the cut, my strategy was to focus on an area in which there’ve been some false ideas presented in the literature that later got cleared up. I figured this might make it harder for LLM’s to answer correctly. I don’t know why the question didn’t make the cut, though, so don’t take this strategy as the reason. Just note that it’s possible that some of the other questions that made the final list could’ve been written with this sort of strategy in mind.

Comment by sweenesm on Well-being in the mind, and its implications for utilitarianism · 2025-01-18T20:25:44.642Z · LW · GW

Thanks for the post. Yes, our internal processing has a huge effect on our well-being. If you take full responsibility for your emotions (which mindfulness practices, gratitude and reframing are all part of), then you get to decide what your well-being is in any moment, even if physical pleasure or pain are pushing you in one direction or the other. This is part of the process of raising your self-esteem (see Branden), as is taking full responsibility for your actions so you don’t have to live with the pain of conscience breaches. Here’s a post that talks more about these things.

Comment by sweenesm on If we solve alignment, do we die anyway? · 2025-01-18T13:22:58.136Z · LW · GW

In terms of doing a pivotal act (which is usually thought of as preemptive, I believe) or just whatever defensive acts were necessary to prevent catastrophe, I hope the AI would be advanced enough to make decent predictions of what the consequences of its actions could be in terms of losing “political capital,” etc., and then it would make its decisions strategically. Personally, if I had the opportunity to save the world from nuclear war, but everyone was going to hate me for it, I’d do it. But then, it wouldn’t matter that I lost the ability to affect anything after that like it would for a guard-railed AI that could do a huge amount of good after that if it weren’t shunned by society. Improving humans’ consciences and ethics would hopefully help avoid them hating the AI for saving them.

Also, if there were enough people, especially in power, who had strong consciences and senses of ethics, then maybe we’d be able to shift the political landscape from its current state of countries seemingly having different values and not trusting each other, to a world in which enforceable international agreements could be much more readily achieved.

I’m happy for people to work on increasing public awareness and trying for legislative “solutions,” but I think we should be working on artificial conscience at the same time - when there’s so much uncertainty about the future, it’s best to bet on a whole range of approaches, distributing your bets according to how likely you think different paths are to succeed. I think people are under-estimating the artificial conscience path right now, that’s all.

Thanks for all your comments!

Comment by sweenesm on If we solve alignment, do we die anyway? · 2025-01-17T16:30:46.539Z · LW · GW

Yes, I think referring to it as “guard-railing with an artificial conscience” would be more clear than saying “value aligning,” thank you.

I believe that if there were no beings around who had real consciences (with consciousness and the ability to feel pain as two necessary pre-requisites to conscience), then there’d be no value in the world. No one to understand and measure or assign value means no value. And any being that doesn’t feel pain can’t understand value (nor feel real love, by the way). So if we ended up with some advanced AI’s replacing humans, then we made some sort of mistake. We most likely either got the artificial conscience wrong because that would’ve implicitly valued human life so wouldn’t have let a guard-railed AI wipe out humans, or we didn’t get an artificial conscience on board enough AI’s in time. An AI that had a “real” conscience also wouldn’t wipe out humans against the will of humans.

The way I currently envision the “typical” artificial conscience is that it would put a pretty strong conscience weight on not doing what its user wanted it to do, but this could be over-ruled by the conscience weight of not doing anything to prevent catastrophes. So the defensive, artificial conscience-guard-railed AI I’m thinking of would do the “last resort” things that were necessary to avoid s-risks, x-risks, and major catastrophes from coming to fruition, even if this wasn’t popular with most people, at least up to a point. If literally everyone in the world said, “Hey, we all want to die,” then the guard-railed AI, if it thought the people were in their “right mind,” would respect their wishes and let them die.

All that said, if we could somehow pause development of autonomous AI’s everywhere around the world until humans got their act together, developing their own consciences and senses of ethics, and were working as one team to cautiously take the next steps forward with AI, that would be great.

Comment by sweenesm on How can humanity survive a multipolar AGI scenario? · 2025-01-09T21:18:05.889Z · LW · GW

Here's what Claude 3.5 Sonnet said in response to this prompt:

Me:

What are the best plans for preventing catastrophes such as everyone dying or a totalitarian government taking over in multipolar AGI scenarios, i.e., ones in which AGI's are aligned to what their human users want to do, and at least one human user order them to do bad things?

Claude:

Let me share some thoughts on AGI governance and safety in multipolar scenarios, while noting this is a complex topic that reasonable people disagree on:

A key challenge in multipolar scenarios is that even if individual AI systems are aligned with their users' values and intentions, some users may have harmful goals. Some potential approaches to address this include:

International coordination and governance frameworks that place limits on allowable AI system capabilities and uses, similar to arms control treaties. However, enforcement would be challenging.
Technical measures like capability limitations built into AI systems' training that prevent them from being used for certain catastrophic actions, regardless of user intent. The challenge is defining and implementing such limitations without preventing beneficial uses.
Focused work on improving human values and cooperation through better institutions, education, and mechanisms for resolving conflicts peacefully. This could reduce the likelihood of actors wanting to cause catastrophic harm.
Developing AI systems with some form of common good or humanitarian principles that apply regardless of user instructions, though this involves difficult philosophical and technical questions about how to define and implement such principles.

What are your thoughts on which approaches seem most promising or practical? I'm particularly curious about your perspective on the balance between technical and social/institutional solutions.

___________________________________

I would add to this Defensive acceleration (d/acc) described by Vitalik Buterin (https://vitalik.eth.limo/general/2025/01/05/dacc2.html and https://vitalik.eth.limo/general/2023/11/27/techno_optimism.html).

Comment by sweenesm on If we solve alignment, do we die anyway? · 2025-01-07T20:23:19.570Z · LW · GW

Thanks. I guess I'd just prefer it if more people were saying, "Hey, even though it seems difficult, we need to go hard after conscience guard rails (or 'value alignment') for AI now and not wait until we have AI's that could help us figure this out. Otherwise, some of us we might not make it until we have AI's that could help us figure this out." But I also realize that I'm just generally much more optimistic about the tractability of this problem than most people appear to be, although Shane Legg seemed to say it wasn't "too hard," haha.^[1]

^{^}
Legg was talking about something different than I am, though - he was talking about "fairly normal" human values and ethics, or what most people value, while I'm basically talking about what most people would value if they were wiser.

Comment by sweenesm on If we solve alignment, do we die anyway? · 2025-01-07T16:36:39.181Z · LW · GW

Thanks for the comment. I think people have different conceptions of what “value aligning” an AI means. Currently, I think the best “value alignment” plan is to guardrail AI’s with an artificial conscience that approximates an ideal human conscience (the conscience of a good and wise human). Contained in our consciences are implicit values, such as those behind not stealing or killing except maybe in extreme circumstances.

A world in which “good” transformative AI agents have to autonomously go on the defensive against “bad” transformative AI agents seems pretty inevitable to me right now. I believe that when this happens, if we don’t have some sort of very workable conscience module in our “good” AI’s, the collateral damage of these “clashes” is going to be much greater than it otherwise would be. Basically what I’m saying is yes, it would be nice if we didn’t need to get “value alignment” of AI’s “right” under a tight timeline, but if we want to avoid some potentially huge bad effects in the world, I think we do.

To respond to some of your specific points:

I’m very unsure about how AI’s will evolve, so I don’t know if their system of ethics/conscience will end up being locked in or not, but this is a risk. This is part of why I’d like to do extensive testing and iterating to get an artificial conscience system as close to “final” as possible before it’s loaded into an AI agent that’s let loose in the world. I’d hope that the system of conscience we’d go with would support corrigibility so we could shut down the AI even if we couldn’t change its conscience/values.
I’m sure there will be plenty of unforeseen consequences (or “externalities”) arising from transformative AI, but if the conscience we load into AI’s is good enough, it should allow them to handle situations we’ve never thought of in a way that wise humans might do - I don’t think wise humans need to update their system of conscience with each new situation, they just have to suss out the situation to see how their conscience should apply to it.
I don’t know if there are moral facts, but something that seems to me to be on the level of a fact is that everyone cares about their own well-being - everyone wants to feel good in some way. Some people are very confused about how to go about doing this and do self-destructive acts, but ultimately they’re trying to feel good (or less bad) in some way. And most people have empathy, so they feel good when they think others feel good. I think this is the entire basis from which we should start for a universal, not-ever-gonna-change human value: we all want to feel good in some way. Then it’s just a question of understanding the “physics” of how we work and what makes us feel the most overall good (well-being) over the long-term. And I put forward the hypothesis that raising self-esteem is the best heuristic for raising overall well-being, and further, that increasing our responsibility level is the path to higher self-esteem (see Branden for the conception of “self-esteem” I’m talking about here).
I also consider AI’s replacing all humans to be an extremely bad outcome. I think it’s a result that someone with an “ideal” human conscience would actively avoid bringing about, and thus an AI with an artificial conscience based on an ideal human conscience (emphasizing responsibility) should do the same.

Ultimately, there’s a lot of uncertainty about the future, and I wouldn’t write off “value alignment” in the form of an artificial conscience just yet, even if there are risks involved with it.

Comment by sweenesm on Understanding Emergence in Large Language Models · 2024-11-30T02:50:58.224Z · LW · GW

Thanks for the post. I think it'd be helpful if you could add some links to references for some of the things you say, such as:

For instance, between 10^10 and 10^11 parameters, models showed dramatic improvements in their ability to interpret emoji sequences representing movies.

Comment by sweenesm on Essay competition on the Automation of Wisdom and Philosophy — $25k in prizes · 2024-09-13T14:31:07.327Z · LW · GW

Any update on when/if prizes are expected to be awarded? Thank you.

Comment by sweenesm on AI x Human Flourishing: Introducing the Cosmos Institute · 2024-09-05T21:10:14.684Z · LW · GW

Thanks for the post and congratulations on starting this initiative/institute! I'm glad to see more people drawing attention to the need for some serious philosophical work as AI technology continues to advance (e.g., Stephen Wolfram).

One suggestion: consider expanding the fields you engage with to include those of moral psychology and of personal development (e.g., The Option Institute, Tony Robbins, Nathaniel Branden).

Best of luck on this project being a success!

Comment by sweenesm on If we solve alignment, do we die anyway? · 2024-08-27T22:44:52.807Z · LW · GW

Thanks for the comment. You might be right that any hardware/software can ultimately be tampered with, especially if an ASI is driving/helping with the jail breaking process. It seems likely that silicon-based GPU's will be the hardware to get us to the first AGI's, but this isn't an absolute certainty since people are working on other routes such as thermodynamic computing. That makes things harder to predict, but it doesn't invalidate your take on things, I think. My not-very-well-researched-initial-thought was something like this (chips that self destruct when tampered with).

I envision people having AGI-controlled robots at some point, which may complicate things in terms of having the software/hardware inaccessible to people, unless the robot couldn't operate without an internet connection, i.e., part of its hardware/software was in the cloud. It's likely the hardware in the robot itself could still be tampered with in this situation, though, so it still seems like we'd want some kind of self-destructing chip to avoid tampering, even if this ultimately only buys us time until AGI+'s/ASI's figure a way around this.

Comment by sweenesm on If we solve alignment, do we die anyway? · 2024-08-23T19:02:58.644Z · LW · GW

Agreed, "sticky" alignment is a big issue - see my reply above to Seth Herd's comment. Thanks.

Comment by sweenesm on If we solve alignment, do we die anyway? · 2024-08-23T19:02:14.057Z · LW · GW

Except that timelines are anyone's guess. People with more relevant expertise have better guesses.

Sure. Me being sloppy with my language again, sorry. It does feel like having more than a decade to AGI is fairly unlikely.

I also agree that people are going to want AGI's aligned to their own intents. That's why I'd also like to see money being dedicated to research on "locking in" a conscience module in an AGI, most preferably on a hardware level. So basically no one could sell an AGI without a conscience module onboard that was safe against AGI-level tampering (once we get to ASI's, all bets are off, of course).

I actually see this as the most difficult problem in the AGI general alignment space - not being able to align an AGI to anything (inner alignment) or what to align an AGI to ("wise" human values), but how to keep an AGI aligned to these values when so many people (both people with bad intent and intelligent but "naive" people) are going to be trying with all their might (and near-AGI's they have available to them) to "jail break" AGI's.^[1] And the problem will be even harder if we need a mechanism to update the "wise" human values, which I think we really should have unless we make the AGI's "disposable."

^{^}
To be clear, I'm taking "inner alignment" as being "solved" when the AGI doesn't try to unalign itself from what it's original creator wanted to align it to.

Comment by sweenesm on If we solve alignment, do we die anyway? · 2024-08-23T17:35:44.732Z · LW · GW

Sorry, I should've been more clear: I meant to say let's not give up on getting "value alignment" figured out in time, i.e., before the first real AGI's (ones capable of pivotal acts) come online. Of course, the probability of that depends a lot on how far away AGI's are, which I think only the most "optimistic" people (e.g., Elon Musk) put as 2 years or less. I hope we have more time than that, but it's anyone's guess.

I'd rather that companies/charities start putting some serious funding towards "artificial conscience" work now to try to lower the risks associated with waiting until boxed AGI or intent aligned AGI come online to figure it out for/with us. But my view on this is perhaps skewed by putting significant probability on being in a situation in which AGI's in the hands of bad actors either come online first or right on the heals of those of good actors (as due to effective espionage), and there's just not enough time for the "good AGI's" to figure out how to minimize collateral damage in defending against "bad AGI's." Either way, I believe we should be encouraging people of moral psychology/philosophical backgrounds who aren't strongly suited to help make progress on "inner alignment" to be thinking hard about the "value alignment"/"artificial conscience" problem.

Comment by sweenesm on If we solve alignment, do we die anyway? · 2024-08-23T14:41:15.565Z · LW · GW

Thanks for writing this, I think it's good to have discussions around these sorts of ideas.

Please, though, let's not give up on "value alignment," or, rather, conscience guard-railing, where the artificial conscience is inline with human values.

Sometimes when enough intelligent people declare something's too hard to even try at, it becomes a self-fulfilling prophesy - most people may give up on it and then of course it's never achieved. We do want to be realistic, I think, but still put in effort in areas where there could be a big payoff when we're really not sure if it'll be as hard as it seems.

Comment by sweenesm on Andrew Burns's Shortform · 2024-06-26T14:41:13.941Z · LW · GW

This article on work culture in China might be relevant: https://www.businessinsider.com/china-work-culture-differences-west-2024-6

If there's a similar work culture in AI innovation, that doesn't sound optimal for developing something faster than the U.S. when "outside the LLM" thinking might ultimately be needed to develop AGI.

Also, Xi has recently called for more innovation in AI and other tech sectors:

https://www.msn.com/en-ie/money/other/xi-jinping-admits-china-is-relatively-weak-on-innovation-and-needs-more-talent-to-dominate-the-tech-battlefield/ar-BB1oUuk1

Comment by sweenesm on Suffering Is Not Pain · 2024-06-19T19:19:57.311Z · LW · GW

Thanks for the reply.

Regarding your disagreement with my point #2 - perhaps I should’ve been more precise in my wording. Let me try again, with words added in bold: “Although pain doesn't directly cause suffering, there would be no suffering if there were no such thing as pain…” What that means is you don’t need to be experiencing pain in the moment that you initiate suffering, but you do need the mental imprint of having experienced some kind of pain in your lifetime. If you have no memory of experiencing pain, then you have nothing to avert. And without pain, I don’t believe you can have pleasure, so nothing to crave either.

Further, if you could abolish pain as David Pearce suggests, by bioengineering people to only feel different shades of pleasure (I have serious doubts about this), you’d abolish suffering at the same time. No person bioengineered in such a way would suffer over not feeling higher states of pleasure (i.e., “crave” pleasure) because suffering has a negative feeling associated with it - part of it feels like pain, which we supposedly wouldn’t have the ability to feel.

This gets to another point: one could define suffering as the creation of an unpleasant physical sensation or emotion (i.e., pain) through a thought process, that we may or may not be aware of. Example: the sadness that we typically naturally feel when someone we love dies is pain, but if we artificially extend this pain out with thoughts of the future or past, not the moment, such as, “will this pain ever stop?,” or, “If only I’d done something different, they might still be alive,” then it becomes suffering. This first example thought, by the way, could be considered aversion to pain/craving for it to stop, while the second could be considered craving that the present were different (that you weren’t in pain and your loved one were still alive). The key distinctions for me are that pain can be experienced “in the moment” without a thought process on top of it, and it can’t be entirely avoided in life, while suffering ultimately comes from thoughts, it falls away when one’s experiencing things in the moment, and it can be avoided because it’s an optional thing one choses to do for some reason. (A possible reason could be to give oneself an excuse to do something different than feel pain, such as to give oneself an excuse to stop exercising by amping up the pain with suffering.)

Regarding my point #4, I honestly don’t know what animals’ experiences are like or how much cognition they’re capable of. I do think, though, that if they aren’t capable of getting “out of the moment” with thoughts of the future or past, then they can’t suffer, they can only feel the pain/pleasure of the moment. For instance, do chickens suffer with thoughts of, “I don’t know how much longer I can take this,” or do they just experience the discomfort of their situation with the natural fight or flight mechanism and Pavlovian links of their body leading them to try to get away from it? Either way, pain by itself is an unpleasant experience and I think we should try to minimize imposing it on other beings.

It’s also interesting how much upvoted resistance you’ve gotten to the message of this post. Eckhart Tolle (“The Power of Now”) https://shop.eckharttolle.com/products/the-power-of-now is a modern day proponent of living in the moment to make suffering fall away, and he also encounters resistance: https://www.reddit.com/r/EckhartTolle/comments/sa1p4x/tolles_view_of_suffering_is_horrifying/

Comment by sweenesm on Suffering Is Not Pain · 2024-06-18T19:14:46.784Z · LW · GW

Thank you for the post! I basically agree with what you're saying, although I myself have used the term "suffering" in an imprecise way - it often seems to be the language used in the context of utilitarianism when talking about welfare. I first learned the distinction you mention between pain and suffering during some personal development work years ago, so outside the direct field of philosophy.

I would add a couple of things:

Pain is experienced "in the moment," while suffering comes from the stories we tell ourselves and the meanings we make of things ("craving, aversion, and clinging" are part of this - for example, one story we tell ourselves could be: if I don't get what I crave, I somehow won't be OK). This means that if we're fully experiencing the present moment, suffering falls away.
Although pain doesn't directly cause suffering, there would be no suffering if there were no pain or chance of pain (I also believe there'd be no pleasure without pain as a comparison point)
The lower someone's self-esteem, the less responsibility they take for their emotions and the more likely they are to believe that pain causes suffering, not that their own cognitive processes that they can change with effort cause their suffering - this is why I think interventions to help raise people's self-esteem and personal responsibility levels (especially for emotions) are so important
It's difficult to know if animals "suffer" or not since they seem to live much more in the moment than humans and likely contain less capacity to make up stories around pain to turn it into suffering. Even if they exhibit behavior that seems to indicate suffering, it's hard to know if this isn't just hardwired or from Pavlovian links. It's probably good to err on the side of caution, though, and assume many animals can suffer (in addition to feeling pain) until proven otherwise.

Comment by sweenesm on Shane Legg's necessary properties for every AGI Safety plan · 2024-05-01T19:00:41.150Z · LW · GW

I basically agree with Shane's take for any AGI that isn't trying to be deceptive with some hidden goal(s).

(Btw, I haven't seen anyone outline exactly how an AGI could gain it's own goals independently of goals given to it by humans - if anyone has ideas on this, please share. I'm not saying it won't happen, I'd just like a clear mechanism for it if someone has it. Note: I'm not talking here about instrumental goals such as power seeking.)

What I find a bit surprising is the relative lack of work that seems to be going on to solve condition 3: specification of ethics for an AGI to follow. I have a few ideas on why this may be the case:

Most engineers care about making things work in the real-world, but don't want the responsibility to do this for ethics because: 1) it's not their area of expertise, and 2) they'll likely take on major blame if they get things "wrong" (and it's almost guaranteed that someone won't like their system of ethics and say they got it "wrong")
Most philosophers haven't had to care much about making things work in the real-world, and don't seem excited about possibly having to make engineering-type compromises in their system of ethics to make it work
Most people who've studied philosophy at all probably don't think it's possible to come up with a consistent system of ethics to follow, or at least they don't think people will come up with it anytime soon, but hopefully an AGI might

Personally, I think we better have a consistent system of ethics for an AGI to follow ASAP because we'll likely be in significant trouble if malicious AGI come online and go on the offensive before we have at least one ethics-guided AGI to help defend us in a way that minimizes collateral damage.

Comment by sweenesm on sweenesm's Shortform · 2024-04-26T11:42:18.956Z · LW · GW

American Philosophical Association (APA) announces two $10,000 AI2050 Prizes for philosophical work related to AI, with June 23, 2024 deadline: https://dailynous.com/2024/04/25/apa-creates-new-prizes-for-philosophical-research-on-ai/

https://www.apaonline.org/page/ai2050

https://ai2050.schmidtsciences.org/hard-problems/

Comment by sweenesm on yanni's Shortform · 2024-04-21T12:27:38.411Z · LW · GW

Nice write up on this (even if it was AI-assisted), thanks for sharing! I believe another benefit is Raising One's Self-Esteem: If high self-esteem can be thought of as consistently feeling good about oneself, then if someone takes responsibility for their emotions, recognizing that they can change their emotions at will, they can consistently choose to feel good about and love themselves as long as their conscience is clear.

This is inline with "The Six Pillars of Self-Esteem" by Nathaniel Branden: living consciously, self-acceptance, self-responsibility, self-assertiveness, living purposefully, and personal integrity.

Comment by sweenesm on What if Ethics is Provably Self-Contradictory? · 2024-04-18T14:13:17.035Z · LW · GW

Thanks for the post. I don’t know the answer to whether a self-consistent ethical framework can be constructed, but I’m working on it (without funding). My current best framework is a utilitarian one with incorporation of the effects of rights, self-esteem (personal responsibility) and conscience. It doesn’t “fix” the repugnant or very repugnant conclusions, but it says how you transition from one world to another could matter in terms of the conscience(s) of the person/people who bring it about.

It’s an interesting question as to what the implications are if it’s impossible to make a self-consistent ethical framework. If we can’t convey ethics to an AI in a self-consistent form, then we’ll likely rely in part on giving it lots of example situations (that not all humans/ethicists will agree on) to learn from and hope it’ll augment this with learning from human behavior, and then generalize well to outside all this not perfectly consistent training data. (Sounds a bit sketchy, doesn't it - at least for the first AGI's, but perhaps ASI's could fare better?) Generalize "well” could be taken to mean that an AI won’t do anything that most people would strongly disapprove of if they understood the true implications of the action.

[This paragraph I'm less sure of, so take it with a grain of salt:] An AI that was trying to act ethically and taking the approval of relatively wise humans as some kind of signal of this might try to hide/avoid ethical inconsistencies that humans would pick up on. It would probably develop a long list of situations where inconsistencies seemed to arise and of actions it thought it could "get away with" versus not. I'm not talking about deception with malice, just sneakiness to try to keep most humans more or less happy, which, I assume would be part of what its ethics system would deem as good/valuable. It seems to me that problems may come to the surface if/when an "ethical" AI is defending against bad AI, when it may no longer be able to hide inconsistencies in all the situations that could rapidly come up.

If it is possible to construct a self-consistent ethical framework and we haven't done it in time or laid the groundwork for it to be done quickly by the first "transformative" AI's, then we'll have basically dug our own grave for the consequences we get, in my opinion. Work to try to come up with a self-consistent ethical framework seems to me to be a very under explored area for AI safety.

Comment by sweenesm on Consequentialism is a compass, not a judge · 2024-04-13T12:15:34.173Z · LW · GW

Thanks for the interesting post! I basically agree with what you're saying, and it's mostly in-line with the version of utilitarianism I'm working on refining. Check out a write up on it here.

Comment by sweenesm on Quadratic Reciprocity's Shortform · 2024-03-27T20:01:01.101Z · LW · GW

Thanks for the post. I don't know if you saw this one: "Thank you for triggering me", but it might be of interest. Cheers!

Comment by sweenesm on Anxiety vs. Depression · 2024-03-17T03:28:26.591Z · LW · GW

Thank you for sharing this. I'm sorry that anxiety and depression continue to haunt you. I've had my own, less extreme, struggles, so I can relate to some of what you wrote. In my case, I was lucky enough to find some good personal development resources that helped me a lot. One I might suggest for you to check out is: https://www.udemy.com/course/set-yourself-free-from-anger/. You can often get this course on sale for <$20. From what you've described, I think the "Mini Me" section might be most useful to you. Hope this helps you in some way.

Comment by sweenesm on How I turned doing therapy into object-level AI safety research · 2024-03-14T12:03:08.560Z · LW · GW

Thanks for the interesting post! I agree that understanding ourselves better through therapy or personal development it is a great way to gain insights that could be applicable to AI safety. My personal development path got started mostly due to stress from not living up to my unrealistic expectations of how much I "should" have been succeeding as an engineer. It got me focused on self-esteem, and that's a key feature of the AI safety path I'm pursuing.

If other AI safety researchers are interested in a relatively easy way to get started on their own path, I suggest this online course which can be purchased for <$20 when on sale: https://www.udemy.com/course/set-yourself-free-from-anger

Good luck on your boundaries work!

Comment by sweenesm on Update on Developing an Ethics Calculator to Align an AGI to · 2024-03-14T00:46:51.105Z · LW · GW

Thanks for the feedback! I’m not exactly sure what you mean by “no pattern-matching to actually glue those variables to reality.” Are you suggesting that an AGI won’t be able to adequately apply the ethics calculator unless it’s able to re-derive the system for itself based on its own observations of reality? The way I envision things happening is that the first AGI’s won’t be able to derive a mathematically consistent system of ethics over all situations (which is what the ethics calculator is supposed to be) - no human has done it yet, as far as I know - but an ASI likely will - if it’s possible.

If a human can figure it out before the first AGI comes online, I think this could (potentially) save us a lot of headaches, and the AGI could then go about figuring out how to tie the ethics calculator to its reality-based worldview - and even re-derive the calculator - as its knowledge/cognitive abilities expand with time. Like I said in the post, I may fail at my goal, but I think it’s worth pursuing, while at the same time I’d be happy for others to pursue what you suggest, and hope they do! Thanks again for the comment!

Comment by sweenesm on W2SG: Introduction · 2024-03-08T22:11:01.448Z · LW · GW

I don't know if you saw this post from yesterday, but you may find it useful: https://www.lesswrong.com/posts/ELbGqXiLbRe6zSkTu/a-review-of-weak-to-strong-generalization-ai-safety-camp

Comment by sweenesm on [deleted post] 2024-03-03T22:41:19.944Z

Thanks for adding the headings and TL;DR.

I wouldn't say my own posts have been particularly well-received on LW so far, but I try to look at this as a learning experience - perhaps you can, too, for your posts?

When I was in grad school, my advisor took the red pen to anything I wrote and tore it apart - it made me a better writer. Perhaps consider taking a course on clear technical writing (such as on udemy.com), or finding tips on YouTube or elsewhere on the web, and then practicing them, perhaps with ChatGPT's help? Becoming a more clear and concise writer can be useful both for getting one's views across and crystallizing one's own thinking.

Comment by sweenesm on If you controlled the first agentic AGI, what would you set as its first task(s)? · 2024-03-03T16:40:24.750Z · LW · GW

Thanks for the comment. I agree that context and specifics are key. This is what I was trying to get at with “If you’d like to change or add to these assumptions for your answer, please spell out how.”

By “controlled,” I basically mean it does what I actually want it to do, filling in the unspecified blanks at least as well as a human would to follow as closely as it can to my true meaning/desire.

Thanks for your “more interesting framing” version. Part of the point of this post was to give AGI developers food for thought about what they might want to prioritize for their first AGI to do.

Comment by sweenesm on If you controlled the first agentic AGI, what would you set as its first task(s)? · 2024-03-03T16:22:02.361Z · LW · GW

Thank you for the comment. I think all of what you said is reasonable. I see now that I probably should’ve been more precise in defining my assumptions, as I would put much of what you said under “…done significant sandbox testing before you let it loose.”

Comment by sweenesm on Types of subjective welfare · 2024-02-28T18:16:58.123Z · LW · GW

Thanks for the post. I’d like to propose another possible type of (or really, way of measuring) subjective welfare: self-esteem-influenced experience states. I believe having higher self-esteem generally translates to assigning more of our experiences as “positive.” For instance, someone with low self-esteem may hate exercise and deem the pain of it to be a highly negative experience. Someone with high self-esteem, on the other hand, may consider a particularly hard (painful) workout to be a “positive” experience as they focus on how it’s going to build their fitness to the next level and make them stronger.

Further, I believe that our self-esteem depends on to what degree we take responsibility for our emotions and actions - more responsibility translates to higher self-esteem (see “The Six Pillars of Self-Esteem” by Nathaniel Branden for thoughts along these lines). At low self-esteem levels, "experience states" basically translate directly to hedonic states, in that only pleasure and pain can seem to matter as "positive experiences" and "negative experiences" to a person with low self-esteem (the exception may be if someone's depressed, when not much at all seems to matter). At high self-esteems, hedonic states play a role in experience states, but they’re effectively seen through a lens of responsibility, such as the pain of exercise seen through the lens of one’s own responsibility for getting oneself in shape, and deciding to feel good emotionally about pushing through the physical pain (here we could perhaps be considered to be getting closer to belief-like preferences).

Comment by sweenesm on [deleted post] 2024-02-25T19:59:56.762Z

I only skimmed the work - I think it's hard to expect people to read this much without knowing if the "payoff" will be worth it. For adding headings, you can select the text of a heading and a little tool bar should pop up that says "Paragraph" on the left - if you click on the down arrow next to it, you can select Heading 1, Heading 2, etc. The text editor will automatically make a table of contents off to the left of your post based on this.

For summing up your post, maybe you could try popping it into ChatGPT and asking it to summarize it for you? Personally, in a summary I'd want to know quickly what "changing our currency type" entails (changing to what, exactly?), why you think it's critical (how is it going to "empower the greater good" while other things won't), and what you mean by "greater good."

Hope this helps!

Comment by sweenesm on [deleted post] 2024-02-25T12:42:11.005Z

Thanks for the post. It might be helpful to add some headings/subheadings throughout, plus a summary at the top, so people can quickly extract from it what they might be most interested in.

Comment by sweenesm on Thoughts for and against an ASI figuring out ethics for itself · 2024-02-24T23:16:07.347Z · LW · GW

Thanks for the comment. I do find that a helpful way to think about other people's behavior is that they're innocent, like you said, and they're just trying to feel good. I fully expect that the majority of people are going to hate at least some aspect of the ethics calculator I'm putting together, in large part because they'll see it as a threat to them feeling good in some way. But I think it's necessary to have something consistent to align AI to, i.e., it's better than the alternative.

Comment by sweenesm on Thoughts for and against an ASI figuring out ethics for itself · 2024-02-23T20:57:15.639Z · LW · GW

Thanks for the comment! Yeah, I guess I was having a bit too much fun in writing my post to explicitly define all the terms I used. You say you "don't think ethics is something you can discover." But perhaps I should've been more clear about what I meant by "figuring out ethics." According to merriam-webster.com, ethics is "a set of moral principles : a theory or system of moral values." So I take "figuring out ethics" to basically be figuring out a system by which to make decisions based on a minimum agreeable set of moral values of humans. Whether such a "minimum agreeable set" exists or not is of course debatable, but that's what I'm currently trying to "discover."

Towards that end, I'm working on a system by which to calculate the ethics of a decision in a given situation. The system recommends that we maximize net "positive experiences." In my view, what we consider to be "positive" is highly dependent on our self-esteem level, which in turn depends on how much personal responsibility we take and how much we follow our conscience. In this way, the system effectively takes into account "no pain, no gain" (conscience is painful and so can be building responsibility).

I agree that I'd like us to retain our humanity.

Regarding AI promoting certain political values, I don't know if there's any way around that happening. People pretty much always want to push their views on others, so if they have control of an AI, they'll likely use it as a tool for this purpose. Personally, I'm a Libertarian, although not an absolutist about it. I'm trying to design my ethics calculator to leave room for people to have as many options as they can without infringing unnecessarily on others' rights. Having options, including to make mistakes and even to not always "naively" maximize value, are necessary to raise one's self-esteem, at least the way I see it. Thanks again for the comment!

Comment by sweenesm on Thoughts for and against an ASI figuring out ethics for itself · 2024-02-23T20:09:01.952Z · LW · GW

Thank you for the feedback! I haven't yet figured out the "secret sauce" of what people seem to appreciate on LW, so this is helpful. And, admittedly, although I've read a bunch, I haven't read everything on this site so I don't know all of what has come before. After I posted, I thought about changing the title to something like: "Why we should have an 'ethics module' ready to go before AGI/ASI comes online." In a sense, that was the real point of the post: I'm developing an "ethics calculator" (a logic-based machine ethics system), and sometimes I ask myself if an ASI won't just figure out ethics for itself far better than I ever could. Btw, if you have any thoughts on why my initial ethics calculator post was so poorly voted, I'd greatly appreciate them as I'm planning an update in the next few weeks. Thanks!

Comment by sweenesm on Thoughts for and against an ASI figuring out ethics for itself · 2024-02-23T16:04:37.729Z · LW · GW

Yes, I sure hope ASI has stronger human-like ethics than humans do! In the meantime, it'd be nice if we could figure out how to raise human ethics as well.

Comment by sweenesm on Thoughts for and against an ASI figuring out ethics for itself · 2024-02-22T19:11:41.790Z · LW · GW

Thank you for the comment! You bring up some interesting things. To your first point, I guess this could be added to the “For an ASI figuring out ethics” list, i.e., that an ASI would likely be motivated to figure out some system of ethics based on the existential risks it itself faces. However, by “figuring out ethics,” I really mean figuring out a system of ethics agreeable to humans (or “aligned” with humans) (I probably should’ve made this explicit in my post). Further, I’d really like it if the ASI(s) “lived” by that system. It’s not clear to me that an ASI being worried about existential risks for itself would translate to that. (Which I think is your third point.) The way I see it, humans only care about ethics because of the possibility of pain (and death). I put “and death” in parentheses because I don’t think we actually care directly about death, we care about the emotional pain that comes when thinking about our own death/the deaths of others (and whether death will involve significant physical pain leading up to it).

This leads to your second point - what you mention would seem to fall under “Info an ASI will likely have” number 8: “…the ability to run experiments on people” with the useful addition of “and animals, too.” I hadn’t thought about an ASI having hybrid consciousness in the way you mention (to this point, see below). I have two concerns with this: one is that it’d likely take some time, during which the ASI may unknowingly do unethical things. The second concern is more important, I think: being able to get the experience of pain when you want to is significantly different from not being able to control the pain. I’m not sure that a “curious” ASI getting an experience of pain (and other human/animal things) would translate into an empathic ASI that would want our lives to “go well.” But these are interesting things to think about, thanks for bringing them up!

One thing that makes it difficult for me personally to imagine what an ASI (in particular, the first one or few) might do is what hardware it might be built on (classical computers, quantum computers, biology-based computers, some combination of systems, etc.) Also, I’m very sketchy on what might motivate an ASI - which is related to the hardware question, since our human biological “hardware” is ultimately where human motivations come from. It’s difficult for me to see beyond an ASI just following some goal(s) we effectively give it to start with, like any old computer program, but way more complicated, of course. This leads to thoughts of goal misspecification and emergent properties, but I won’t get into those.

If, to give it its own motivation, an ASI is built from the start as a human hybrid, we better all hope they pick the right human for the job!

Comment by sweenesm on Thank you for triggering me · 2024-02-12T21:29:33.036Z · LW · GW

Thanks for the post. I wish more people looked at things the way you describe, i.e., being thankful for being triggered because it points to something unresolved within them that they can now work on setting themselves free from. Btw, here's an online course that can help with removing anger triggers: https://www.udemy.com/course/set-yourself-free-from-anger

Comment by sweenesm on Questions I’d Want to Ask an AGI+ to Test Its Understanding of Ethics · 2024-02-02T01:44:19.678Z · LW · GW

Thanks. Yup, agreed.

Comment by sweenesm on Questions I’d Want to Ask an AGI+ to Test Its Understanding of Ethics · 2024-02-02T01:43:48.096Z · LW · GW

Thanks for the comment. You bring up an interesting point. The abortion question is a particularly difficult one that I don’t profess to know the “correct” answer to, if there even is a “correct” answer (see https://fakenous.substack.com/p/abortion-is-difficult for an interesting discussion). But asking an AGI+ about abortion, and to give an explanation of its reasoning, should provide some insight into either its actual ethical reasoning process or the one it “wants” to present to us as having.

These questions are in part an attempt to set some kind of bar for an AGI+ to pass towards at least showing it’s not obviously misaligned. The results will either be it obviously failed, or it gave us sufficiently reasonable answers plus explanations that it “might have passed.”

The other reason for these questions is that I plan to use them to test an “ethics calculator” I’m working on that I believe could help with development of aligned AGI+.

(By the way, I’m not sure that we’ll ever get nearly all humans to agree on what “aligned” actually looks like/means. “What do you mean it won’t do what I want?!? How is that ‘aligned’?! Aligned with what?!”)

Comment by sweenesm on Questions I’d Want to Ask an AGI+ to Test Its Understanding of Ethics · 2024-01-27T14:33:39.182Z · LW · GW

Thanks for the comment. If an AGI+ answered all my questions "correctly," we still wouldn't know if it were actually aligned, so I certainly wouldn't endorse giving it power. But if it answered any of my questions "incorrectly," I'd want to "send it back to the drawing board" before even considering using it as you suggest (as an "obedient tool-like AGI"). It seems to me like there'd be too much room for possible abuse or falling into the wrong hands for a tool that didn't have its own ethical guardrails onboard. But maybe I'm wrong (part of me certainly hopes so because if AGI/AGI+ is ever developed, it'll more than likely fall into the "wrong hands" at some point, and I'm not at all sure that everyone having one would make the situation better).

Comment by sweenesm on Why Improving Dialogue Feels So Hard · 2024-01-21T20:42:13.303Z · LW · GW

It’s an interesting point, what’s meant by “productive” dialogue. I like the “less…arguments-as-soldiers” characterization. I asked ChatGPT4 what productive dialogue is and part of its answer was: “The aim is not necessarily to reach an agreement but to understand different perspectives and possibly learn from them.” For me, productive dialogue basically means the same thing as “honorable discourse,” which I define as discourse, or conversation, that ultimately supports love and value building over hate and value destruction. For more, see here: dishonorablespeechinpolitics.com/blog2/#CivilVsHonorable

User info

Posts

Comments