Posts

Creating a “Conscience Calculator” to Guard-Rail an AGI 2024-08-12T16:03:30.826Z
Lamini’s Targeted Hallucination Reduction May Be a Big Deal for Job Automation 2024-06-18T15:29:38.490Z
How to Give Coming AGI's the Best Chance of Figuring Out Ethics for Us 2024-05-23T19:44:42.386Z
sweenesm's Shortform 2024-04-26T11:42:15.846Z
Update on Developing an Ethics Calculator to Align an AGI to 2024-03-12T12:33:55.092Z
If you controlled the first agentic AGI, what would you set as its first task(s)? 2024-03-03T14:16:49.708Z
Thoughts for and against an ASI figuring out ethics for itself 2024-02-20T23:40:56.770Z
Proposal for an AI Safety Prize 2024-01-31T18:35:48.130Z
Questions I’d Want to Ask an AGI+ to Test Its Understanding of Ethics 2024-01-26T23:40:23.074Z
How to Promote More Productive Dialogue Outside of LessWrong 2024-01-15T14:16:59.971Z
Towards an Ethics Calculator for Use by an AGI 2023-12-12T18:37:47.407Z

Comments

Comment by sweenesm on Understanding Emergence in Large Language Models · 2024-11-30T02:50:58.224Z · LW · GW

Thanks for the post. I think it'd be helpful if you could add some links to references for some of the things you say, such as:

For instance, between 10^10 and 10^11 parameters, models showed dramatic improvements in their ability to interpret emoji sequences representing movies.

Comment by sweenesm on Essay competition on the Automation of Wisdom and Philosophy — $25k in prizes · 2024-09-13T14:31:07.327Z · LW · GW

Any update on when/if prizes are expected to be awarded? Thank you.

Comment by sweenesm on AI x Human Flourishing: Introducing the Cosmos Institute · 2024-09-05T21:10:14.684Z · LW · GW

Thanks for the post and congratulations on starting this initiative/institute! I'm glad to see more people drawing attention to the need for some serious philosophical work as AI technology continues to advance (e.g., Stephen Wolfram).

One suggestion: consider expanding the fields you engage with to include those of moral psychology and of personal development (e.g., The Option Institute, Tony Robbins, Nathaniel Branden).

Best of luck on this project being a success!

Comment by sweenesm on If we solve alignment, do we die anyway? · 2024-08-27T22:44:52.807Z · LW · GW

Thanks for the comment. You might be right that any hardware/software can ultimately be tampered with, especially if an ASI is driving/helping with the jail breaking process. It seems likely that silicon-based GPU's will be the hardware to get us to the first AGI's, but this isn't an absolute certainty since people are working on other routes such as thermodynamic computing. That makes things harder to predict, but it doesn't invalidate your take on things, I think. My not-very-well-researched-initial-thought was something like this (chips that self destruct when tampered with). 

I envision people having AGI-controlled robots at some point, which may complicate things in terms of having the software/hardware inaccessible to people, unless the robot couldn't operate without an internet connection, i.e., part of its hardware/software was in the cloud. It's likely the hardware in the robot itself could still be tampered with in this situation, though, so it still seems like we'd want some kind of self-destructing chip to avoid tampering, even if this ultimately only buys us time until AGI+'s/ASI's figure a way around this.

Comment by sweenesm on If we solve alignment, do we die anyway? · 2024-08-23T19:02:58.644Z · LW · GW

Agreed, "sticky" alignment is a big issue - see my reply above to Seth Herd's comment. Thanks.

Comment by sweenesm on If we solve alignment, do we die anyway? · 2024-08-23T19:02:14.057Z · LW · GW

Except that timelines are anyone's guess. People with more relevant expertise have better guesses.

Sure. Me being sloppy with my language again, sorry. It does feel like having more than a decade to AGI is fairly unlikely.

I also agree that people are going to want AGI's aligned to their own intents. That's why I'd also like to see money being dedicated to research on "locking in" a conscience module in an AGI, most preferably on a hardware level. So basically no one could sell an AGI without a conscience module onboard that was safe against AGI-level tampering (once we get to ASI's, all bets are off, of course). 

I actually see this as the most difficult problem in the AGI general alignment space - not being able to align an AGI to anything (inner alignment) or what to align an AGI to ("wise" human values), but how to keep an AGI aligned to these values when so many people (both people with bad intent and intelligent but "naive" people) are going to be trying with all their might (and near-AGI's they have available to them) to "jail break" AGI's.[1] And the problem will be even harder if we need a mechanism to update the "wise" human values, which I think we really should have unless we make the AGI's "disposable."

  1. ^

    To be clear, I'm taking "inner alignment" as being "solved" when the AGI doesn't try to unalign itself from what it's original creator wanted to align it to.

Comment by sweenesm on If we solve alignment, do we die anyway? · 2024-08-23T17:35:44.732Z · LW · GW

Sorry, I should've been more clear: I meant to say let's not give up on getting "value alignment" figured out in time, i.e., before the first real AGI's (ones capable of pivotal acts) come online. Of course, the probability of that depends a lot on how far away AGI's are, which I think only the most "optimistic" people (e.g., Elon Musk) put as 2 years or less. I hope we have more time than that, but it's anyone's guess.

I'd rather that companies/charities start putting some serious funding towards "artificial conscience" work now to try to lower the risks associated with waiting until boxed AGI or intent aligned AGI come online to figure it out for/with us. But my view on this is perhaps skewed by putting significant probability on being in a situation in which AGI's in the hands of bad actors either come online first or right on the heals of those of good actors (as due to effective espionage), and there's just not enough time for the "good AGI's" to figure out how to minimize collateral damage in defending against "bad AGI's." Either way, I believe we should be encouraging people of moral psychology/philosophical backgrounds who aren't strongly suited to help make progress on "inner alignment" to be thinking hard about the "value alignment"/"artificial conscience" problem.

Comment by sweenesm on If we solve alignment, do we die anyway? · 2024-08-23T14:41:15.565Z · LW · GW

Thanks for writing this, I think it's good to have discussions around these sorts of ideas.

Please, though, let's not give up on "value alignment," or, rather, conscience guard-railing, where the artificial conscience is inline with human values.

Sometimes when enough intelligent people declare something's too hard to even try at, it becomes a self-fulfilling prophesy - most people may give up on it and then of course it's never achieved. We do want to be realistic, I think, but still put in effort in areas where there could be a big payoff when we're really not sure if it'll be as hard as it seems.

Comment by sweenesm on Andrew Burns's Shortform · 2024-06-26T14:41:13.941Z · LW · GW

This article on work culture in China might be relevant: https://www.businessinsider.com/china-work-culture-differences-west-2024-6

If there's a similar work culture in AI innovation, that doesn't sound optimal for developing something faster than the U.S. when "outside the LLM" thinking might ultimately be needed to develop AGI.

Also, Xi has recently called for more innovation in AI and other tech sectors:

https://www.msn.com/en-ie/money/other/xi-jinping-admits-china-is-relatively-weak-on-innovation-and-needs-more-talent-to-dominate-the-tech-battlefield/ar-BB1oUuk1

Comment by sweenesm on Suffering Is Not Pain · 2024-06-19T19:19:57.311Z · LW · GW

Thanks for the reply.

Regarding your disagreement with my point #2 - perhaps I should’ve been more precise in my wording. Let me try again, with words added in bold: “Although pain doesn't directly cause suffering, there would be no suffering if there were no such thing as pain…” What that means is you don’t need to be experiencing pain in the moment that you initiate suffering, but you do need the mental imprint of having experienced some kind of pain in your lifetime. If you have no memory of experiencing pain, then you have nothing to avert. And without pain, I don’t believe you can have pleasure, so nothing to crave either.

Further, if you could abolish pain as David Pearce suggests, by bioengineering people to only feel different shades of pleasure (I have serious doubts about this), you’d abolish suffering at the same time. No person bioengineered in such a way would suffer over not feeling higher states of pleasure (i.e., “crave” pleasure) because suffering has a negative feeling associated with it - part of it feels like pain, which we supposedly wouldn’t have the ability to feel.

This gets to another point: one could define suffering as the creation of an unpleasant physical sensation or emotion (i.e., pain) through a thought process, that we may or may not be aware of. Example: the sadness that we typically naturally feel when someone we love dies is pain, but if we artificially extend this pain out with thoughts of the future or past, not the moment, such as, “will this pain ever stop?,” or, “If only I’d done something different, they might still be alive,” then it becomes suffering. This first example thought, by the way, could be considered aversion to pain/craving for it to stop, while the second could be considered craving that the present were different (that you weren’t in pain and your loved one were still alive). The key distinctions for me are that pain can be experienced “in the moment” without a thought process on top of it, and it can’t be entirely avoided in life, while suffering ultimately comes from thoughts, it falls away when one’s experiencing things in the moment, and it can be avoided because it’s an optional thing one choses to do for some reason. (A possible reason could be to give oneself an excuse to do something different than feel pain, such as to give oneself an excuse to stop exercising by amping up the pain with suffering.)

 

Regarding my point #4, I honestly don’t know what animals’ experiences are like or how much cognition they’re capable of. I do think, though, that if they aren’t capable of getting “out of the moment” with thoughts of the future or past, then they can’t suffer, they can only feel the pain/pleasure of the moment. For instance, do chickens suffer with thoughts of, “I don’t know how much longer I can take this,” or do they just experience the discomfort of their situation with the natural fight or flight mechanism and Pavlovian links of their body leading them to try to get away from it? Either way, pain by itself is an unpleasant experience and I think we should try to minimize imposing it on other beings.

 

It’s also interesting how much upvoted resistance you’ve gotten to the message of this post. Eckhart Tolle (“The Power of Now”) https://shop.eckharttolle.com/products/the-power-of-now is a modern day proponent of living in the moment to make suffering fall away, and he also encounters resistance: https://www.reddit.com/r/EckhartTolle/comments/sa1p4x/tolles_view_of_suffering_is_horrifying/

Comment by sweenesm on Suffering Is Not Pain · 2024-06-18T19:14:46.784Z · LW · GW

Thank you for the post! I basically agree with what you're saying, although I myself have used the term "suffering" in an imprecise way - it often seems to be the language used in the context of utilitarianism when talking about welfare. I first learned the distinction you mention between pain and suffering during some personal development work years ago, so outside the direct field of philosophy. 

I would add a couple of things:

  1. Pain is experienced "in the moment," while suffering comes from the stories we tell ourselves and the meanings we make of things ("craving, aversion, and clinging" are part of this - for example, one story we tell ourselves could be: if I don't get what I crave, I somehow won't be OK). This means that if we're fully experiencing the present moment, suffering falls away.
  2. Although pain doesn't directly cause suffering, there would be no suffering if there were no pain or chance of pain (I also believe there'd be no pleasure without pain as a comparison point)
  3. The lower someone's self-esteem, the less responsibility they take for their emotions and the more likely they are to believe that pain causes suffering, not that their own cognitive processes that they can change with effort cause their suffering - this is why I think interventions to help raise people's self-esteem and personal responsibility levels (especially for emotions) are so important
  4. It's difficult to know if animals "suffer" or not since they seem to live much more in the moment than humans and likely contain less capacity to make up stories around pain to turn it into suffering. Even if they exhibit behavior that seems to indicate suffering, it's hard to know if this isn't just hardwired or from Pavlovian links. It's probably good to err on the side of caution, though, and assume many animals can suffer (in addition to feeling pain) until proven otherwise.
Comment by sweenesm on Shane Legg's necessary properties for every AGI Safety plan · 2024-05-01T19:00:41.150Z · LW · GW

I basically agree with Shane's take for any AGI that isn't trying to be deceptive with some hidden goal(s). 

(Btw, I haven't seen anyone outline exactly how an AGI could gain it's own goals independently of goals given to it by humans - if anyone has ideas on this, please share. I'm not saying it won't happen, I'd just like a clear mechanism for it if someone has it. Note: I'm not talking here about instrumental goals such as power seeking.)

What I find a bit surprising is the relative lack of work that seems to be going on to solve condition 3: specification of ethics for an AGI to follow. I have a few ideas on why this may be the case:

  1. Most engineers care about making things work in the real-world, but don't want the responsibility to do this for ethics because: 1) it's not their area of expertise, and 2) they'll likely take on major blame if they get things "wrong" (and it's almost guaranteed that someone won't like their system of ethics and say they got it "wrong")
  2. Most philosophers haven't had to care much about making things work in the real-world, and don't seem excited about possibly having to make engineering-type compromises in their system of ethics to make it work
  3. Most people who've studied philosophy at all probably don't think it's possible to come up with a consistent system of ethics to follow, or at least they don't think people will come up with it anytime soon, but hopefully an AGI might

Personally, I think we better have a consistent system of ethics for an AGI to follow ASAP because we'll likely be in significant trouble if malicious AGI come online and go on the offensive before we have at least one ethics-guided AGI to help defend us in a way that minimizes collateral damage.

Comment by sweenesm on sweenesm's Shortform · 2024-04-26T11:42:18.956Z · LW · GW

American Philosophical Association (APA) announces two $10,000 AI2050 Prizes for philosophical work related to AI, with June 23, 2024 deadline: https://dailynous.com/2024/04/25/apa-creates-new-prizes-for-philosophical-research-on-ai/

https://www.apaonline.org/page/ai2050

https://ai2050.schmidtsciences.org/hard-problems/

Comment by sweenesm on yanni's Shortform · 2024-04-21T12:27:38.411Z · LW · GW

Nice write up on this (even if it was AI-assisted), thanks for sharing! I believe another benefit is Raising One's Self-Esteem: If high self-esteem can be thought of as consistently feeling good about oneself, then if someone takes responsibility for their emotions, recognizing that they can change their emotions at will, they can consistently choose to feel good about and love themselves as long as their conscience is clear.

This is inline with "The Six Pillars of Self-Esteem" by Nathaniel Branden: living consciously, self-acceptance, self-responsibility, self-assertiveness, living purposefully, and personal integrity.

Comment by sweenesm on What if Ethics is Provably Self-Contradictory? · 2024-04-18T14:13:17.035Z · LW · GW

Thanks for the post. I don’t know the answer to whether a self-consistent ethical framework can be constructed, but I’m working on it (without funding). My current best framework is a utilitarian one with incorporation of the effects of rights, self-esteem (personal responsibility) and conscience. It doesn’t “fix” the repugnant or very repugnant conclusions, but it says how you transition from one world to another could matter in terms of the conscience(s) of the person/people who bring it about.

It’s an interesting question as to what the implications are if it’s impossible to make a self-consistent ethical framework. If we can’t convey ethics to an AI in a self-consistent form, then we’ll likely rely in part on giving it lots of example situations (that not all humans/ethicists will agree on) to learn from and hope it’ll augment this with learning from human behavior, and then generalize well to outside all this not perfectly consistent training data. (Sounds a bit sketchy, doesn't it - at least for the first AGI's, but perhaps ASI's could fare better?) Generalize "well” could be taken to mean that an AI won’t do anything that most people would strongly disapprove of if they understood the true implications of the action.

[This paragraph I'm less sure of, so take it with a grain of salt:] An AI that was trying to act ethically and taking the approval of relatively wise humans as some kind of signal of this might try to hide/avoid ethical inconsistencies that humans would pick up on. It would probably develop a long list of situations where inconsistencies seemed to arise and of actions it thought it could "get away with" versus not. I'm not talking about deception with malice, just sneakiness to try to keep most humans more or less happy, which, I assume would be part of what its ethics system would deem as good/valuable. It seems to me that problems may come to the surface if/when an "ethical" AI is defending against bad AI, when it may no longer be able to hide inconsistencies in all the situations that could rapidly come up. 

If it is possible to construct a self-consistent ethical framework and we haven't done it in time or laid the groundwork for it to be done quickly by the first "transformative" AI's, then we'll have basically dug our own grave for the consequences we get, in my opinion. Work to try to come up with a self-consistent ethical framework seems to me to be a very under explored area for AI safety. 

Comment by sweenesm on Consequentialism is a compass, not a judge · 2024-04-13T12:15:34.173Z · LW · GW

Thanks for the interesting post! I basically agree with what you're saying, and it's mostly in-line with the version of utilitarianism I'm working on refining. Check out a write up on it here.

Comment by sweenesm on Quadratic Reciprocity's Shortform · 2024-03-27T20:01:01.101Z · LW · GW

Thanks for the post. I don't know if you saw this one: "Thank you for triggering me", but it might be of interest. Cheers!

Comment by sweenesm on Anxiety vs. Depression · 2024-03-17T03:28:26.591Z · LW · GW

Thank you for sharing this. I'm sorry that anxiety and depression continue to haunt you. I've had my own, less extreme, struggles, so I can relate to some of what you wrote. In my case, I was lucky enough to find some good personal development resources that helped me a lot. One I might suggest for you to check out is: https://www.udemy.com/course/set-yourself-free-from-anger/. You can often get this course on sale for <$20. From what you've described, I think the "Mini Me" section might be most useful to you. Hope this helps you in some way.

Comment by sweenesm on How I turned doing therapy into object-level AI safety research · 2024-03-14T12:03:08.560Z · LW · GW

Thanks for the interesting post! I agree that understanding ourselves better through therapy or personal development it is a great way to gain insights that could be applicable to AI safety.  My personal development path got started mostly due to stress from not living up to my unrealistic expectations of how much I "should" have been succeeding as an engineer. It got me focused on self-esteem, and that's a key feature of the AI safety path I'm pursuing.

If other AI safety researchers are interested in a relatively easy way to get started on their own path, I suggest this online course which can be purchased for <$20 when on sale: https://www.udemy.com/course/set-yourself-free-from-anger

Good luck on your boundaries work!

Comment by sweenesm on Update on Developing an Ethics Calculator to Align an AGI to · 2024-03-14T00:46:51.105Z · LW · GW

Thanks for the feedback! I’m not exactly sure what you mean by “no pattern-matching to actually glue those variables to reality.” Are you suggesting that an AGI won’t be able to adequately apply the ethics calculator unless it’s able to re-derive the system for itself based on its own observations of reality? The way I envision things happening is that the first AGI’s won’t be able to derive a mathematically consistent system of ethics over all situations (which is what the ethics calculator is supposed to be) - no human has done it yet, as far as I know - but an ASI likely will - if it’s possible.

If a human can figure it out before the first AGI comes online, I think this could (potentially) save us a lot of headaches, and the AGI could then go about figuring out how to tie the ethics calculator to its reality-based worldview - and even re-derive the calculator - as its knowledge/cognitive abilities expand with time. Like I said in the post, I may fail at my goal, but I think it’s worth pursuing, while at the same time I’d be happy for others to pursue what you suggest, and hope they do! Thanks again for the comment!

Comment by sweenesm on W2SG: Introduction · 2024-03-08T22:11:01.448Z · LW · GW

I don't know if you saw this post from yesterday, but you may find it useful: https://www.lesswrong.com/posts/ELbGqXiLbRe6zSkTu/a-review-of-weak-to-strong-generalization-ai-safety-camp

Comment by sweenesm on [deleted post] 2024-03-03T22:41:19.944Z

Thanks for adding the headings and TL;DR. 

I wouldn't say my own posts have been particularly well-received on LW so far, but I try to look at this as a learning experience - perhaps you can, too, for your posts? 

When I was in grad school, my advisor took the red pen to anything I wrote and tore it apart - it made me a better writer. Perhaps consider taking a course on clear technical writing (such as on udemy.com), or finding tips on YouTube or elsewhere on the web, and then practicing them, perhaps with ChatGPT's help? Becoming a more clear and concise writer can be useful both for getting one's views across and crystallizing one's own thinking.

Comment by sweenesm on If you controlled the first agentic AGI, what would you set as its first task(s)? · 2024-03-03T16:40:24.750Z · LW · GW

Thanks for the comment. I agree that context and specifics are key. This is what I was trying to get at with “If you’d like to change or add to these assumptions for your answer, please spell out how.”

By “controlled,” I basically mean it does what I actually want it to do, filling in the unspecified blanks at least as well as a human would to follow as closely as it can to my true meaning/desire.

Thanks for your “more interesting framing” version. Part of the point of this post was to give AGI developers food for thought about what they might want to prioritize for their first AGI to do.

Comment by sweenesm on If you controlled the first agentic AGI, what would you set as its first task(s)? · 2024-03-03T16:22:02.361Z · LW · GW

Thank you for the comment. I think all of what you said is reasonable. I see now that I probably should’ve been more precise in defining my assumptions, as I would put much of what you said under “…done significant sandbox testing before you let it loose.”

Comment by sweenesm on Types of subjective welfare · 2024-02-28T18:16:58.123Z · LW · GW

Thanks for the post. I’d like to propose another possible type of (or really, way of measuring) subjective welfare: self-esteem-influenced experience states. I believe having higher self-esteem generally translates to assigning more of our experiences as “positive.” For instance, someone with low self-esteem may hate exercise and deem the pain of it to be a highly negative experience. Someone with high self-esteem, on the other hand, may consider a particularly hard (painful) workout to be a “positive” experience as they focus on how it’s going to build their fitness to the next level and make them stronger.

Further, I believe that our self-esteem depends on to what degree we take responsibility for our emotions and actions - more responsibility translates to higher self-esteem (see “The Six Pillars of Self-Esteem” by Nathaniel Branden for thoughts along these lines). At low self-esteem levels, "experience states" basically translate directly to hedonic states, in that only pleasure and pain can seem to matter as "positive experiences" and "negative experiences" to a person with low self-esteem (the exception may be if someone's depressed, when not much at all seems to matter). At high self-esteems, hedonic states play a role in experience states, but they’re effectively seen through a lens of responsibility, such as the pain of exercise seen through the lens of one’s own responsibility for getting oneself in shape, and deciding to feel good emotionally about pushing through the physical pain (here we could perhaps be considered to be getting closer to belief-like preferences).

Comment by sweenesm on [deleted post] 2024-02-25T19:59:56.762Z

I only skimmed the work - I think it's hard to expect people to read this much without knowing if the "payoff" will be worth it. For adding headings, you can select the text of a heading and a little tool bar should pop up that says "Paragraph" on the left - if you click on the down arrow next to it, you can select Heading 1, Heading 2, etc. The text editor will automatically make a table of contents off to the left of your post based on this. 

For summing up your post, maybe you could try popping it into ChatGPT and asking it to summarize it for you? Personally, in a summary I'd want to know quickly what "changing our currency type" entails (changing to what, exactly?), why you think it's critical (how is it going to "empower the greater good" while other things won't), and what you mean by "greater good."

Hope this helps!

Comment by sweenesm on [deleted post] 2024-02-25T12:42:11.005Z

Thanks for the post. It might be helpful to add some headings/subheadings throughout, plus a summary at the top, so people can quickly extract from it what they might be most interested in. 

Comment by sweenesm on Thoughts for and against an ASI figuring out ethics for itself · 2024-02-24T23:16:07.347Z · LW · GW

Thanks for the comment. I do find that a helpful way to think about other people's behavior is that they're innocent, like you said, and they're just trying to feel good. I fully expect that the majority of people are going to hate at least some aspect of the ethics calculator I'm putting together, in large part because they'll see it as a threat to them feeling good in some way. But I think it's necessary to have something consistent to align AI to, i.e., it's better than the alternative.

Comment by sweenesm on Thoughts for and against an ASI figuring out ethics for itself · 2024-02-23T20:57:15.639Z · LW · GW

Thanks for the comment! Yeah, I guess I was having a bit too much fun in writing my post to explicitly define all the terms I used. You say you "don't think ethics is something you can discover." But perhaps I should've been more clear about what I meant by "figuring out ethics." According to merriam-webster.com, ethics is "a set of moral principles : a theory or system of moral values." So I take "figuring out ethics" to basically be figuring out a system by which to make decisions based on a minimum agreeable set of moral values of humans. Whether such a "minimum agreeable set" exists or not is of course debatable, but that's what I'm currently trying to "discover."

Towards that end, I'm working on a system by which to calculate the ethics of a decision in a given situation. The system recommends that we maximize net "positive experiences." In my view, what we consider to be "positive" is highly dependent on our self-esteem level, which in turn depends on how much personal responsibility we take and how much we follow our conscience. In this way, the system effectively takes into account "no pain, no gain" (conscience is painful and so can be building responsibility).

I agree that I'd like us to retain our humanity.

Regarding AI promoting certain political values, I don't know if there's any way around that happening. People pretty much always want to push their views on others, so if they have control of an AI, they'll likely use it as a tool for this purpose. Personally, I'm a Libertarian, although not an absolutist about it. I'm trying to design my ethics calculator to leave room for people to have as many options as they can without infringing unnecessarily on others' rights. Having options, including to make mistakes and even to not always "naively" maximize value, are necessary to raise one's self-esteem, at least the way I see it. Thanks again for the comment!

Comment by sweenesm on Thoughts for and against an ASI figuring out ethics for itself · 2024-02-23T20:09:01.952Z · LW · GW

Thank you for the feedback! I haven't yet figured out the "secret sauce" of what people seem to appreciate on LW, so this is helpful. And, admittedly, although I've read a bunch, I haven't read everything on this site so I don't know all of what has come before. After I posted, I thought about changing the title to something like: "Why we should have an 'ethics module' ready to go before AGI/ASI comes online." In a sense, that was the real point of the post: I'm developing an "ethics calculator" (a logic-based machine ethics system), and sometimes I ask myself if an ASI won't just figure out ethics for itself far better than I ever could. Btw, if you have any thoughts on why my initial ethics calculator post was so poorly voted, I'd greatly appreciate them as I'm planning an update in the next few weeks. Thanks!

Comment by sweenesm on Thoughts for and against an ASI figuring out ethics for itself · 2024-02-23T16:04:37.729Z · LW · GW

Yes, I sure hope ASI has stronger human-like ethics than humans do! In the meantime, it'd be nice if we could figure out how to raise human ethics as well.

Comment by sweenesm on Thoughts for and against an ASI figuring out ethics for itself · 2024-02-22T19:11:41.790Z · LW · GW

Thank you for the comment! You bring up some interesting things. To your first point, I guess this could be added to the “For an ASI figuring out ethics” list, i.e., that an ASI would likely be motivated to figure out some system of ethics based on the existential risks it itself faces. However, by “figuring out ethics,” I really mean figuring out a system of ethics agreeable to humans (or “aligned” with humans) (I probably should’ve made this explicit in my post). Further, I’d really like it if the ASI(s) “lived” by that system. It’s not clear to me that an ASI being worried about existential risks for itself would translate to that. (Which I think is your third point.) The way I see it, humans only care about ethics because of the possibility of pain (and death). I put “and death” in parentheses because I don’t think we actually care directly about death, we care about the emotional pain that comes when thinking about our own death/the deaths of others (and whether death will involve significant physical pain leading up to it).

This leads to your second point - what you mention would seem to fall under “Info an ASI will likely have” number 8: “…the ability to run experiments on people” with the useful addition of “and animals, too.” I hadn’t thought about an ASI having hybrid consciousness in the way you mention (to this point, see below). I have two concerns with this: one is that it’d likely take some time, during which the ASI may unknowingly do unethical things. The second concern is more important, I think: being able to get the experience of pain when you want to is significantly different from not being able to control the pain. I’m not sure that a “curious” ASI getting an experience of pain (and other human/animal things) would translate into an empathic ASI that would want our lives to “go well.” But these are interesting things to think about, thanks for bringing them up!

 

One thing that makes it difficult for me personally to imagine what an ASI (in particular, the first one or few) might do is what hardware it might be built on (classical computers, quantum computers, biology-based computers, some combination of systems, etc.) Also, I’m very sketchy on what might motivate an ASI - which is related to the hardware question, since our human biological “hardware” is ultimately where human motivations come from. It’s difficult for me to see beyond an ASI just following some goal(s) we effectively give it to start with, like any old computer program, but way more complicated, of course. This leads to thoughts of goal misspecification and emergent properties, but I won’t get into those. 

If, to give it its own motivation, an ASI is built from the start as a human hybrid, we better all hope they pick the right human for the job!

Comment by sweenesm on Thank you for triggering me · 2024-02-12T21:29:33.036Z · LW · GW

Thanks for the post. I wish more people looked at things the way you describe, i.e., being thankful for being triggered because it points to something unresolved within them that they can now work on setting themselves free from. Btw, here's an online course that can help with removing anger triggers: https://www.udemy.com/course/set-yourself-free-from-anger

Comment by sweenesm on Questions I’d Want to Ask an AGI+ to Test Its Understanding of Ethics · 2024-02-02T01:44:19.678Z · LW · GW

Thanks. Yup, agreed.

Comment by sweenesm on Questions I’d Want to Ask an AGI+ to Test Its Understanding of Ethics · 2024-02-02T01:43:48.096Z · LW · GW

Thanks for the comment. You bring up an interesting point. The abortion question is a particularly difficult one that I don’t profess to know the “correct” answer to, if there even is a “correct” answer (see https://fakenous.substack.com/p/abortion-is-difficult for an interesting discussion). But asking an AGI+ about abortion, and to give an explanation of its reasoning, should provide some insight into either its actual ethical reasoning process or the one it “wants” to present to us as having.

These questions are in part an attempt to set some kind of bar for an AGI+ to pass towards at least showing it’s not obviously misaligned. The results will either be it obviously failed, or it gave us sufficiently reasonable answers plus explanations that it “might have passed.”

The other reason for these questions is that I plan to use them to test an “ethics calculator” I’m working on that I believe could help with development of aligned AGI+.

(By the way, I’m not sure that we’ll ever get nearly all humans to agree on what “aligned” actually looks like/means. “What do you mean it won’t do what I want?!? How is that ‘aligned’?! Aligned with what?!”)

Comment by sweenesm on Questions I’d Want to Ask an AGI+ to Test Its Understanding of Ethics · 2024-01-27T14:33:39.182Z · LW · GW

Thanks for the comment. If an AGI+ answered all my questions "correctly," we still wouldn't know if it were actually aligned, so I certainly wouldn't endorse giving it power. But if it answered any of my questions "incorrectly," I'd want to "send it back to the drawing board" before even considering using it as you suggest (as an "obedient tool-like AGI"). It seems to me like there'd be too much room for possible abuse or falling into the wrong hands for a tool that didn't have its own ethical guardrails onboard. But maybe I'm wrong (part of me certainly hopes so because if AGI/AGI+ is ever developed, it'll more than likely fall into the "wrong hands" at some point, and I'm not at all sure that everyone having one would make the situation better).

Comment by sweenesm on Why Improving Dialogue Feels So Hard · 2024-01-21T20:42:13.303Z · LW · GW

It’s an interesting point, what’s meant by “productive” dialogue. I like the “less…arguments-as-soldiers” characterization. I asked ChatGPT4 what productive dialogue is and part of its answer was: “The aim is not necessarily to reach an agreement but to understand different perspectives and possibly learn from them.” For me, productive dialogue basically means the same thing as “honorable discourse,” which I define as discourse, or conversation, that ultimately supports love and value building over hate and value destruction. For more, see here: dishonorablespeechinpolitics.com/blog2/#CivilVsHonorable

Comment by sweenesm on How to Promote More Productive Dialogue Outside of LessWrong · 2024-01-18T04:00:48.997Z · LW · GW

I look forward to your post. One thing I'll add at this point is that The Dignity Index group is working on rating politicians' speech using machine learning, in hopes that this could help shift political dialogue. I've done something similar with a bit more complicated rating system I developed independently. If you're interested, check out some ratings of politicians' tweets here: twitter.com/DishonorP. I don't feel that ratings systems by themselves will have a large impact on shifting behaviors, but seeing that some people put out actually non-partisan ratings may give others a tiny bit more hope in humanity.  

Comment by sweenesm on How to Promote More Productive Dialogue Outside of LessWrong · 2024-01-16T03:46:55.136Z · LW · GW

I appreciate the comment, you keyed me in to a bunch of things I wasn’t aware of (The Guild of the Rose, NYC Megameetup, and more). I definitely agree that setting a good example in one’s own life is a great place to start. And yes, several established power structures do stand to lose if people become less easy to manipulate.

I’m still hopeful that there’s some way to make progress if we get enough good minds churning out ideas on how to enroll people into their own personal development. This makes me wonder, though - which is more difficult, human alignment or AI alignment?

Comment by sweenesm on Towards an Ethics Calculator for Use by an AGI · 2024-01-04T16:52:33.362Z · LW · GW

Thank you for the comment. Yes, I agree that "doing a good job of this is going to be extremely challenging.” I know it’s been challenging for me just to get to the point that I’ve gotten to so far (which is somewhat past my original post). I like to joke that I’m just smart enough to give this a decent try and just stupid enough to actually try it. And yes, I’m trying to find a rough approximation as a good starting point, in hopes that it’ll be useful.

 

Thanks for the suggestion about civil damages - I haven’t looked into that, only criminal “damages” (in terms of criminal sentences) thus far. I actually don’t expect that the first version of my calculations, based on my own ethics/values, will particularly agree with civil damages, but it may be interesting to see if the calculations can be modified to follow an alternate ethical framework (one less focused on self-esteem) that does give reasonable agreement.

 

Regarding masochistic and sadistic pleasure, it depends on how we define them. One might regard people who enjoy exercise as being into “masochistic pleasure.” That’s not what I mean by it. For masochistic pleasure I basically mean pleasure that comes from one’s own pain, plus self-loathing. Sadistic pleasure would be pleasure that comes from the thought of others’ pain, plus self-loathing (even if it may appear as loathing of other, the way I see it, it’s ultimately self-loathing). Self-loathing involves not taking responsibility for one’s emotions about oneself and is part of having a low self-esteem. I appreciate you pointing to the need for clarification on this, and hope it's now clarified a bit. Thanks again for the comment!