Posts

sweenesm's Shortform 2024-04-26T11:42:15.846Z
Update on Developing an Ethics Calculator to Align an AGI to 2024-03-12T12:33:55.092Z
If you controlled the first agentic AGI, what would you set as its first task(s)? 2024-03-03T14:16:49.708Z
Thoughts for and against an ASI figuring out ethics for itself 2024-02-20T23:40:56.770Z
Proposal for an AI Safety Prize 2024-01-31T18:35:48.130Z
Questions I’d Want to Ask an AGI+ to Test Its Understanding of Ethics 2024-01-26T23:40:23.074Z
How to Promote More Productive Dialogue Outside of LessWrong 2024-01-15T14:16:59.971Z
Towards an Ethics Calculator for Use by an AGI 2023-12-12T18:37:47.407Z

Comments

Comment by sweenesm on Shane Legg's necessary properties for every AGI Safety plan · 2024-05-01T19:00:41.150Z · LW · GW

I basically agree with Shane's take for any AGI that isn't trying to be deceptive with some hidden goal(s). 

(Btw, I haven't seen anyone outline exactly how an AGI could gain it's own goals independently of goals given to it by humans - if anyone has ideas on this, please share. I'm not saying it won't happen, I'd just like a clear mechanism for it if someone has it. Note: I'm not talking here about instrumental goals such as power seeking.)

What I find a bit surprising is the relative lack of work that seems to be going on to solve condition 3: specification of ethics for an AGI to follow. I have a few ideas on why this may be the case:

  1. Most engineers care about making things work in the real-world, but don't want the responsibility to do this for ethics because: 1) it's not their area of expertise, and 2) they'll likely take on major blame if they get things "wrong" (and it's almost guaranteed that someone won't like their system of ethics and say they got it "wrong")
  2. Most philosophers haven't had to care much about making things work in the real-world, and don't seem excited about possibly having to make engineering-type compromises in their system of ethics to make it work
  3. Most people who've studied philosophy at all probably don't think it's possible to come up with a consistent system of ethics to follow, or at least they don't think people will come up with it anytime soon, but hopefully an AGI might

Personally, I think we better have a consistent system of ethics for an AGI to follow ASAP because we'll likely be in significant trouble if malicious AGI come online and go on the offensive before we have at least one ethics-guided AGI to help defend us in a way that minimizes collateral damage.

Comment by sweenesm on sweenesm's Shortform · 2024-04-26T11:42:18.956Z · LW · GW

American Philosophical Association (APA) announces two $10,000 AI2050 Prizes for philosophical work related to AI, with June 23, 2024 deadline: https://dailynous.com/2024/04/25/apa-creates-new-prizes-for-philosophical-research-on-ai/

https://www.apaonline.org/page/ai2050

https://ai2050.schmidtsciences.org/hard-problems/

Comment by sweenesm on yanni's Shortform · 2024-04-21T12:27:38.411Z · LW · GW

Nice write up on this (even if it was AI-assisted), thanks for sharing! I believe another benefit is Raising One's Self-Esteem: If high self-esteem can be thought of as consistently feeling good about oneself, then if someone takes responsibility for their emotions, recognizing that they can change their emotions at will, they can consistently choose to feel good about and love themselves as long as their conscience is clear.

This is inline with "The Six Pillars of Self-Esteem" by Nathaniel Branden: living consciously, self-acceptance, self-responsibility, self-assertiveness, living purposefully, and personal integrity.

Comment by sweenesm on What if Ethics is Provably Self-Contradictory? · 2024-04-18T14:13:17.035Z · LW · GW

Thanks for the post. I don’t know the answer to whether a self-consistent ethical framework can be constructed, but I’m working on it (without funding). My current best framework is a utilitarian one with incorporation of the effects of rights, self-esteem (personal responsibility) and conscience. It doesn’t “fix” the repugnant or very repugnant conclusions, but it says how you transition from one world to another could matter in terms of the conscience(s) of the person/people who bring it about.

It’s an interesting question as to what the implications are if it’s impossible to make a self-consistent ethical framework. If we can’t convey ethics to an AI in a self-consistent form, then we’ll likely rely in part on giving it lots of example situations (that not all humans/ethicists will agree on) to learn from and hope it’ll augment this with learning from human behavior, and then generalize well to outside all this not perfectly consistent training data. (Sounds a bit sketchy, doesn't it - at least for the first AGI's, but perhaps ASI's could fare better?) Generalize "well” could be taken to mean that an AI won’t do anything that most people would strongly disapprove of if they understood the true implications of the action.

[This paragraph I'm less sure of, so take it with a grain of salt:] An AI that was trying to act ethically and taking the approval of relatively wise humans as some kind of signal of this might try to hide/avoid ethical inconsistencies that humans would pick up on. It would probably develop a long list of situations where inconsistencies seemed to arise and of actions it thought it could "get away with" versus not. I'm not talking about deception with malice, just sneakiness to try to keep most humans more or less happy, which, I assume would be part of what its ethics system would deem as good/valuable. It seems to me that problems may come to the surface if/when an "ethical" AI is defending against bad AI, when it may no longer be able to hide inconsistencies in all the situations that could rapidly come up. 

If it is possible to construct a self-consistent ethical framework and we haven't done it in time or laid the groundwork for it to be done quickly by the first "transformative" AI's, then we'll have basically dug our own grave for the consequences we get, in my opinion. Work to try to come up with a self-consistent ethical framework seems to me to be a very under explored area for AI safety. 

Comment by sweenesm on Consequentialism is a compass, not a judge · 2024-04-13T12:15:34.173Z · LW · GW

Thanks for the interesting post! I basically agree with what you're saying, and it's mostly in-line with the version of utilitarianism I'm working on refining. Check out a write up on it here.

Comment by sweenesm on Quadratic Reciprocity's Shortform · 2024-03-27T20:01:01.101Z · LW · GW

Thanks for the post. I don't know if you saw this one: "Thank you for triggering me", but it might be of interest. Cheers!

Comment by sweenesm on Anxiety vs. Depression · 2024-03-17T03:28:26.591Z · LW · GW

Thank you for sharing this. I'm sorry that anxiety and depression continue to haunt you. I've had my own, less extreme, struggles, so I can relate to some of what you wrote. In my case, I was lucky enough to find some good personal development resources that helped me a lot. One I might suggest for you to check out is: https://www.udemy.com/course/set-yourself-free-from-anger/. You can often get this course on sale for <$20. From what you've described, I think the "Mini Me" section might be most useful to you. Hope this helps you in some way.

Comment by sweenesm on How I turned doing therapy into object-level AI safety research · 2024-03-14T12:03:08.560Z · LW · GW

Thanks for the interesting post! I agree that understanding ourselves better through therapy or personal development it is a great way to gain insights that could be applicable to AI safety.  My personal development path got started mostly due to stress from not living up to my unrealistic expectations of how much I "should" have been succeeding as an engineer. It got me focused on self-esteem, and that's a key feature of the AI safety path I'm pursuing.

If other AI safety researchers are interested in a relatively easy way to get started on their own path, I suggest this online course which can be purchased for <$20 when on sale: https://www.udemy.com/course/set-yourself-free-from-anger

Good luck on your boundaries work!

Comment by sweenesm on Update on Developing an Ethics Calculator to Align an AGI to · 2024-03-14T00:46:51.105Z · LW · GW

Thanks for the feedback! I’m not exactly sure what you mean by “no pattern-matching to actually glue those variables to reality.” Are you suggesting that an AGI won’t be able to adequately apply the ethics calculator unless it’s able to re-derive the system for itself based on its own observations of reality? The way I envision things happening is that the first AGI’s won’t be able to derive a mathematically consistent system of ethics over all situations (which is what the ethics calculator is supposed to be) - no human has done it yet, as far as I know - but an ASI likely will - if it’s possible.

If a human can figure it out before the first AGI comes online, I think this could (potentially) save us a lot of headaches, and the AGI could then go about figuring out how to tie the ethics calculator to its reality-based worldview - and even re-derive the calculator - as its knowledge/cognitive abilities expand with time. Like I said in the post, I may fail at my goal, but I think it’s worth pursuing, while at the same time I’d be happy for others to pursue what you suggest, and hope they do! Thanks again for the comment!

Comment by sweenesm on W2SG: Introduction · 2024-03-08T22:11:01.448Z · LW · GW

I don't know if you saw this post from yesterday, but you may find it useful: https://www.lesswrong.com/posts/ELbGqXiLbRe6zSkTu/a-review-of-weak-to-strong-generalization-ai-safety-camp

Comment by sweenesm on [deleted post] 2024-03-03T22:41:19.944Z

Thanks for adding the headings and TL;DR. 

I wouldn't say my own posts have been particularly well-received on LW so far, but I try to look at this as a learning experience - perhaps you can, too, for your posts? 

When I was in grad school, my advisor took the red pen to anything I wrote and tore it apart - it made me a better writer. Perhaps consider taking a course on clear technical writing (such as on udemy.com), or finding tips on YouTube or elsewhere on the web, and then practicing them, perhaps with ChatGPT's help? Becoming a more clear and concise writer can be useful both for getting one's views across and crystallizing one's own thinking.

Comment by sweenesm on If you controlled the first agentic AGI, what would you set as its first task(s)? · 2024-03-03T16:40:24.750Z · LW · GW

Thanks for the comment. I agree that context and specifics are key. This is what I was trying to get at with “If you’d like to change or add to these assumptions for your answer, please spell out how.”

By “controlled,” I basically mean it does what I actually want it to do, filling in the unspecified blanks at least as well as a human would to follow as closely as it can to my true meaning/desire.

Thanks for your “more interesting framing” version. Part of the point of this post was to give AGI developers food for thought about what they might want to prioritize for their first AGI to do.

Comment by sweenesm on If you controlled the first agentic AGI, what would you set as its first task(s)? · 2024-03-03T16:22:02.361Z · LW · GW

Thank you for the comment. I think all of what you said is reasonable. I see now that I probably should’ve been more precise in defining my assumptions, as I would put much of what you said under “…done significant sandbox testing before you let it loose.”

Comment by sweenesm on Types of subjective welfare · 2024-02-28T18:16:58.123Z · LW · GW

Thanks for the post. I’d like to propose another possible type of (or really, way of measuring) subjective welfare: self-esteem-influenced experience states. I believe having higher self-esteem generally translates to assigning more of our experiences as “positive.” For instance, someone with low self-esteem may hate exercise and deem the pain of it to be a highly negative experience. Someone with high self-esteem, on the other hand, may consider a particularly hard (painful) workout to be a “positive” experience as they focus on how it’s going to build their fitness to the next level and make them stronger.

Further, I believe that our self-esteem depends on to what degree we take responsibility for our emotions and actions - more responsibility translates to higher self-esteem (see “The Six Pillars of Self-Esteem” by Nathaniel Branden for thoughts along these lines). At low self-esteem levels, "experience states" basically translate directly to hedonic states, in that only pleasure and pain can seem to matter as "positive experiences" and "negative experiences" to a person with low self-esteem (the exception may be if someone's depressed, when not much at all seems to matter). At high self-esteems, hedonic states play a role in experience states, but they’re effectively seen through a lens of responsibility, such as the pain of exercise seen through the lens of one’s own responsibility for getting oneself in shape, and deciding to feel good emotionally about pushing through the physical pain (here we could perhaps be considered to be getting closer to belief-like preferences).

Comment by sweenesm on [deleted post] 2024-02-25T19:59:56.762Z

I only skimmed the work - I think it's hard to expect people to read this much without knowing if the "payoff" will be worth it. For adding headings, you can select the text of a heading and a little tool bar should pop up that says "Paragraph" on the left - if you click on the down arrow next to it, you can select Heading 1, Heading 2, etc. The text editor will automatically make a table of contents off to the left of your post based on this. 

For summing up your post, maybe you could try popping it into ChatGPT and asking it to summarize it for you? Personally, in a summary I'd want to know quickly what "changing our currency type" entails (changing to what, exactly?), why you think it's critical (how is it going to "empower the greater good" while other things won't), and what you mean by "greater good."

Hope this helps!

Comment by sweenesm on [deleted post] 2024-02-25T12:42:11.005Z

Thanks for the post. It might be helpful to add some headings/subheadings throughout, plus a summary at the top, so people can quickly extract from it what they might be most interested in. 

Comment by sweenesm on Thoughts for and against an ASI figuring out ethics for itself · 2024-02-24T23:16:07.347Z · LW · GW

Thanks for the comment. I do find that a helpful way to think about other people's behavior is that they're innocent, like you said, and they're just trying to feel good. I fully expect that the majority of people are going to hate at least some aspect of the ethics calculator I'm putting together, in large part because they'll see it as a threat to them feeling good in some way. But I think it's necessary to have something consistent to align AI to, i.e., it's better than the alternative.

Comment by sweenesm on Thoughts for and against an ASI figuring out ethics for itself · 2024-02-23T20:57:15.639Z · LW · GW

Thanks for the comment! Yeah, I guess I was having a bit too much fun in writing my post to explicitly define all the terms I used. You say you "don't think ethics is something you can discover." But perhaps I should've been more clear about what I meant by "figuring out ethics." According to merriam-webster.com, ethics is "a set of moral principles : a theory or system of moral values." So I take "figuring out ethics" to basically be figuring out a system by which to make decisions based on a minimum agreeable set of moral values of humans. Whether such a "minimum agreeable set" exists or not is of course debatable, but that's what I'm currently trying to "discover."

Towards that end, I'm working on a system by which to calculate the ethics of a decision in a given situation. The system recommends that we maximize net "positive experiences." In my view, what we consider to be "positive" is highly dependent on our self-esteem level, which in turn depends on how much personal responsibility we take and how much we follow our conscience. In this way, the system effectively takes into account "no pain, no gain" (conscience is painful and so can be building responsibility).

I agree that I'd like us to retain our humanity.

Regarding AI promoting certain political values, I don't know if there's any way around that happening. People pretty much always want to push their views on others, so if they have control of an AI, they'll likely use it as a tool for this purpose. Personally, I'm a Libertarian, although not an absolutist about it. I'm trying to design my ethics calculator to leave room for people to have as many options as they can without infringing unnecessarily on others' rights. Having options, including to make mistakes and even to not always "naively" maximize value, are necessary to raise one's self-esteem, at least the way I see it. Thanks again for the comment!

Comment by sweenesm on Thoughts for and against an ASI figuring out ethics for itself · 2024-02-23T20:09:01.952Z · LW · GW

Thank you for the feedback! I haven't yet figured out the "secret sauce" of what people seem to appreciate on LW, so this is helpful. And, admittedly, although I've read a bunch, I haven't read everything on this site so I don't know all of what has come before. After I posted, I thought about changing the title to something like: "Why we should have an 'ethics module' ready to go before AGI/ASI comes online." In a sense, that was the real point of the post: I'm developing an "ethics calculator" (a logic-based machine ethics system), and sometimes I ask myself if an ASI won't just figure out ethics for itself far better than I ever could. Btw, if you have any thoughts on why my initial ethics calculator post was so poorly voted, I'd greatly appreciate them as I'm planning an update in the next few weeks. Thanks!

Comment by sweenesm on Thoughts for and against an ASI figuring out ethics for itself · 2024-02-23T16:04:37.729Z · LW · GW

Yes, I sure hope ASI has stronger human-like ethics than humans do! In the meantime, it'd be nice if we could figure out how to raise human ethics as well.

Comment by sweenesm on Thoughts for and against an ASI figuring out ethics for itself · 2024-02-22T19:11:41.790Z · LW · GW

Thank you for the comment! You bring up some interesting things. To your first point, I guess this could be added to the “For an ASI figuring out ethics” list, i.e., that an ASI would likely be motivated to figure out some system of ethics based on the existential risks it itself faces. However, by “figuring out ethics,” I really mean figuring out a system of ethics agreeable to humans (or “aligned” with humans) (I probably should’ve made this explicit in my post). Further, I’d really like it if the ASI(s) “lived” by that system. It’s not clear to me that an ASI being worried about existential risks for itself would translate to that. (Which I think is your third point.) The way I see it, humans only care about ethics because of the possibility of pain (and death). I put “and death” in parentheses because I don’t think we actually care directly about death, we care about the emotional pain that comes when thinking about our own death/the deaths of others (and whether death will involve significant physical pain leading up to it).

This leads to your second point - what you mention would seem to fall under “Info an ASI will likely have” number 8: “…the ability to run experiments on people” with the useful addition of “and animals, too.” I hadn’t thought about an ASI having hybrid consciousness in the way you mention (to this point, see below). I have two concerns with this: one is that it’d likely take some time, during which the ASI may unknowingly do unethical things. The second concern is more important, I think: being able to get the experience of pain when you want to is significantly different from not being able to control the pain. I’m not sure that a “curious” ASI getting an experience of pain (and other human/animal things) would translate into an empathic ASI that would want our lives to “go well.” But these are interesting things to think about, thanks for bringing them up!

 

One thing that makes it difficult for me personally to imagine what an ASI (in particular, the first one or few) might do is what hardware it might be built on (classical computers, quantum computers, biology-based computers, some combination of systems, etc.) Also, I’m very sketchy on what might motivate an ASI - which is related to the hardware question, since our human biological “hardware” is ultimately where human motivations come from. It’s difficult for me to see beyond an ASI just following some goal(s) we effectively give it to start with, like any old computer program, but way more complicated, of course. This leads to thoughts of goal misspecification and emergent properties, but I won’t get into those. 

If, to give it its own motivation, an ASI is built from the start as a human hybrid, we better all hope they pick the right human for the job!

Comment by sweenesm on Thank you for triggering me · 2024-02-12T21:29:33.036Z · LW · GW

Thanks for the post. I wish more people looked at things the way you describe, i.e., being thankful for being triggered because it points to something unresolved within them that they can now work on setting themselves free from. Btw, here's an online course that can help with removing anger triggers: https://www.udemy.com/course/set-yourself-free-from-anger

Comment by sweenesm on Questions I’d Want to Ask an AGI+ to Test Its Understanding of Ethics · 2024-02-02T01:44:19.678Z · LW · GW

Thanks. Yup, agreed.

Comment by sweenesm on Questions I’d Want to Ask an AGI+ to Test Its Understanding of Ethics · 2024-02-02T01:43:48.096Z · LW · GW

Thanks for the comment. You bring up an interesting point. The abortion question is a particularly difficult one that I don’t profess to know the “correct” answer to, if there even is a “correct” answer (see https://fakenous.substack.com/p/abortion-is-difficult for an interesting discussion). But asking an AGI+ about abortion, and to give an explanation of its reasoning, should provide some insight into either its actual ethical reasoning process or the one it “wants” to present to us as having.

These questions are in part an attempt to set some kind of bar for an AGI+ to pass towards at least showing it’s not obviously misaligned. The results will either be it obviously failed, or it gave us sufficiently reasonable answers plus explanations that it “might have passed.”

The other reason for these questions is that I plan to use them to test an “ethics calculator” I’m working on that I believe could help with development of aligned AGI+.

(By the way, I’m not sure that we’ll ever get nearly all humans to agree on what “aligned” actually looks like/means. “What do you mean it won’t do what I want?!? How is that ‘aligned’?! Aligned with what?!”)

Comment by sweenesm on Questions I’d Want to Ask an AGI+ to Test Its Understanding of Ethics · 2024-01-27T14:33:39.182Z · LW · GW

Thanks for the comment. If an AGI+ answered all my questions "correctly," we still wouldn't know if it were actually aligned, so I certainly wouldn't endorse giving it power. But if it answered any of my questions "incorrectly," I'd want to "send it back to the drawing board" before even considering using it as you suggest (as an "obedient tool-like AGI"). It seems to me like there'd be too much room for possible abuse or falling into the wrong hands for a tool that didn't have its own ethical guardrails onboard. But maybe I'm wrong (part of me certainly hopes so because if AGI/AGI+ is ever developed, it'll more than likely fall into the "wrong hands" at some point, and I'm not at all sure that everyone having one would make the situation better).

Comment by sweenesm on Why Improving Dialogue Feels So Hard · 2024-01-21T20:42:13.303Z · LW · GW

It’s an interesting point, what’s meant by “productive” dialogue. I like the “less…arguments-as-soldiers” characterization. I asked ChatGPT4 what productive dialogue is and part of its answer was: “The aim is not necessarily to reach an agreement but to understand different perspectives and possibly learn from them.” For me, productive dialogue basically means the same thing as “honorable discourse,” which I define as discourse, or conversation, that ultimately supports love and value building over hate and value destruction. For more, see here: dishonorablespeechinpolitics.com/blog2/#CivilVsHonorable

Comment by sweenesm on How to Promote More Productive Dialogue Outside of LessWrong · 2024-01-18T04:00:48.997Z · LW · GW

I look forward to your post. One thing I'll add at this point is that The Dignity Index group is working on rating politicians' speech using machine learning, in hopes that this could help shift political dialogue. I've done something similar with a bit more complicated rating system I developed independently. If you're interested, check out some ratings of politicians' tweets here: twitter.com/DishonorP. I don't feel that ratings systems by themselves will have a large impact on shifting behaviors, but seeing that some people put out actually non-partisan ratings may give others a tiny bit more hope in humanity.  

Comment by sweenesm on How to Promote More Productive Dialogue Outside of LessWrong · 2024-01-16T03:46:55.136Z · LW · GW

I appreciate the comment, you keyed me in to a bunch of things I wasn’t aware of (The Guild of the Rose, NYC Megameetup, and more). I definitely agree that setting a good example in one’s own life is a great place to start. And yes, several established power structures do stand to lose if people become less easy to manipulate.

I’m still hopeful that there’s some way to make progress if we get enough good minds churning out ideas on how to enroll people into their own personal development. This makes me wonder, though - which is more difficult, human alignment or AI alignment?

Comment by sweenesm on Towards an Ethics Calculator for Use by an AGI · 2024-01-04T16:52:33.362Z · LW · GW

Thank you for the comment. Yes, I agree that "doing a good job of this is going to be extremely challenging.” I know it’s been challenging for me just to get to the point that I’ve gotten to so far (which is somewhat past my original post). I like to joke that I’m just smart enough to give this a decent try and just stupid enough to actually try it. And yes, I’m trying to find a rough approximation as a good starting point, in hopes that it’ll be useful.

 

Thanks for the suggestion about civil damages - I haven’t looked into that, only criminal “damages” (in terms of criminal sentences) thus far. I actually don’t expect that the first version of my calculations, based on my own ethics/values, will particularly agree with civil damages, but it may be interesting to see if the calculations can be modified to follow an alternate ethical framework (one less focused on self-esteem) that does give reasonable agreement.

 

Regarding masochistic and sadistic pleasure, it depends on how we define them. One might regard people who enjoy exercise as being into “masochistic pleasure.” That’s not what I mean by it. For masochistic pleasure I basically mean pleasure that comes from one’s own pain, plus self-loathing. Sadistic pleasure would be pleasure that comes from the thought of others’ pain, plus self-loathing (even if it may appear as loathing of other, the way I see it, it’s ultimately self-loathing). Self-loathing involves not taking responsibility for one’s emotions about oneself and is part of having a low self-esteem. I appreciate you pointing to the need for clarification on this, and hope it's now clarified a bit. Thanks again for the comment!