Aligned with what?

program-den

Aligned with what?

post by Program Den (program-den) · 2023-01-14T10:28:10.929Z · LW · GW · 41 comments

41 comments

I'm assuming there are other people (I'm a person too, honest!) up in here asking this same question, but I haven't seen them so far, and I do see all these posts about AI "alignment" and I can't help but wonder: when did we discover an objective definition of "good"?

I've already mentioned it elsewhere here, but I think Nietzsche has some good (heh) thoughts about the nature of Good and Evil, and that they are subjective concepts. As ChatGPT has to say:

Nietzsche believed that good and evil are not fixed things, but rather something that people create in their minds. He thought that people create their own sense of what is good and what is bad, and that it changes depending on the culture and time period. He also believed that people often use the idea of "good and evil" to justify their own actions and to control others. So, in simple terms, Nietzsche believed that good and evil are not real things that exist on their own, but are instead created by people's thoughts and actions.

How does "alignment" differ? Is there a definition somewhere? From what I see, it's subjective. What is the real difference between "how to do X" and "how to prevent X"? One form is good and the other not— depending on what X is? But again, perhaps I misunderstand the goal, and what exactly is being proposed be controlled.

Is information itself good or bad? Or is it how the information is used that is good or bad (and as mentioned, relatively so)?

I do not know. I do know that I'm stoked about AI, as I have been since I was smol, and as I am about all the advancements us just-above-animals make. Biased for sure.

41 comments

Comments sorted by top scores.

comment by the gears to ascension (lahwran) · 2023-01-14T11:02:09.390Z · LW(p) · GW(p)

~~I'm about to sleep so I'll post better links tomorrow~~ did it now actually,

but the summary is that this is absolutely everyone's first question. the alignment problem is typically defined as "aligning the ai to anyone and anything at all besides exclusively selfishly caring about whatever the ai grows up to obsess about". there's a lot "aligned to who" worrying to be done as well, but most of the worry is about stuff about deceptive alignment, because of the worry that an ai could stumble on unexpectedly powerful capabilities which turn a previously apparently-aligned ai into a catastrophically misaligned one.

some recent relevant posts

Replies from: program-den

↑ comment by Program Den (program-den) · 2023-01-15T09:57:47.576Z · LW(p) · GW(p)

Thanks for the links!

I see more interesting things going on in the comments, as far as what I was wondering, than what is in the posts themselves, as the posts all seem to assume we've sorted out some super basic stuff that I don't know that humans have sorted out yet, such as if there is an objective "good", etc., which seem rather necessary things to suss before trying to hew to— be it for us or AIs we create.

I get the premise, and I think Science Fiction has done an admirable job of laying it all out for us already, and I guess I'm just a bit confused as to if we're writing fiction here or trying to be non-fictional?

Replies from: mr-hire

↑ comment by Matt Goldenberg (mr-hire) · 2023-01-15T14:15:38.879Z · LW(p) · GW(p)

as the posts all seem to assume we've sorted out some super basic stuff that I don't know that humans have sorted out yet, such as if there is an objective "good"

One way to break down the alignment problem is between "how do we align the AI" and "what should we align it to". It turns out we don't have agreement on the second question and don't know how to do the first. Even granted that we don't have an answer to the second question it ses prudentto be able to answer the first?

By the time we answer the second it may be too late to answer the first.

Replies from: lahwran, program-den

↑ comment by the gears to ascension (lahwran) · 2023-01-16T08:00:49.723Z · LW(p) · GW(p)

i'm pretty sure solving either will solve both, and that understanding this is key to solving either. these all are the same thing afaict:

international relations (what are the steps towards building a "world's EMT" force? how does one end war for good?)
complex systems alignment (what are the steps towards building a toolkit to move any complex system towards co-protective stability?)
inter-being alignment (how do you make practical conflict resolution easy to understand and implement?)
inter-neuron alignment (various forms of internal negotiation and ifs and blah blah etc)
biosecurity (how can cells protect and heal each other and repel invaders)

it's all unavoidably the same stack of problems: how do you determine if a chunk of other-matter is in a shape which is safe and assistive for the self-matter's shape, according to consensus of self-matter? how can two agentic chunks of matter establish mutual honesty without getting used against their own preference by the other? how do you ensure mutual honesty or interaction is not generated when it is not clear that there is safety to be honest or interact? how do you ensure it does happen when it is needed? this sounds like an economics problem to me. seems to me like we need multi-type multiscale economic feedback, to track damage vs fuel vs repair-aid.

eg, on the individual/small group scale: https://www.microsolidarity.cc/

Replies from: mr-hire

↑ comment by Matt Goldenberg (mr-hire) · 2023-01-16T12:53:11.335Z · LW(p) · GW(p)

it's all unavoidably the same stack of problems: how do you determine if a chunk of other-matter is in a shape which is safe and assistive for the self-matter's shape, according to consensus of self-matter?

Certainly there is a level of abstraction where it's the same problem, but I see no reason why the solution will unavoidably be found at that level of abstraction.

Replies from: program-den

↑ comment by Program Den (program-den) · 2023-01-16T17:12:08.077Z · LW(p) · GW(p)

It must depend on levels of intelligence and agency, right? I wonder if there is a threshold for both of those in machines and people that we'd need to reach for there to even be abstract solutions to these problems? For sure with machines we're talking about far past what exists currently (they are not very intelligent, and do not have much agency), and it seems that while humans have been working on it for a while, we're not exactly there yet either.

Seems like the alignment would have to be from micro to macro as well, with constant communication and reassessment, to prevent subversion.

Or, what was a fine self-chunk [arbitrary time ago], may not be now. Once you have stacks of "intelligent agents" (mesa or meta or otherwise) I'd think the predictability goes down, which is part of what worries folks. But if we don't look at safety as something that is "tacked on after" for either humans or programs, but rather something innate to the very processes, perhaps there's not so much to worry about.

Replies from: mr-hire

↑ comment by Matt Goldenberg (mr-hire) · 2023-01-16T19:57:22.830Z · LW(p) · GW(p)

Well, the same alignment issue happens with organizations, as well as within an individual with different goals and desires. It turns out that the existing "solutions" to these abstractly similar problems look quite different because the details matter a lot. And I think AGI is actually more dissimilar to any of these than they are to each other.

Replies from: program-den

↑ comment by Program Den (program-den) · 2023-01-17T04:59:09.515Z · LW(p) · GW(p)

Do we all have the same definition of what AGI is? Do you mean being able to um, mimic the things a human can do, or are you talking full on Strong AI, sentient computers, etc.?

Like, if we're talking The Singularity, we call it that because all bets are off past the event horizon.

Most the discussion here seems to sort of be talking about weak AI, or the road we're on from what we have now (not even worthy of actually calling "AI", IMHO— ML at least is a less overloaded term) to true AI, or the edge of that horizon line, as it were.

When you said "the same alignment issue happens with organizations, as well as within an individual with different goals and desires" I was like "yes!" but then you went on to say AGI is dissimilar, and I was like "no?".

AGI as we're talking about here is rather about abstractions, it seems, so if we come up with math that works for us, to prevent humans from doing Bad Stuff, it seems like those same checks and balances might work for our programs? At least we'd have an idea, right?

Or, maybe, we already have the idea, or at least the germination of one, as we somehow haven't managed to destroy ourselves or the planet. Yet. 😝

↑ comment by Program Den (program-den) · 2023-01-15T21:15:06.680Z · LW(p) · GW(p)

Since we're anthropomorphizing^[1] so much— how to we align humans?

We're worried about AI getting too powerful, but logically that means humans are getting too powerful, right? Thus what we have to do to cover question 1 (how), regardless of question 2 (what), is control human behavior, correct?

How do we ensure that we churn out "good" humans? Gods? Laws? Logic? Communication? Education? This is not a new question per se, and I guess the scary thing is that, perhaps, it is impossible to ensure that literally every human is Good™ (we'll use a loose def of 'you know what I mean— not evil!').

This is only "scary" because humans are getting freakishly powerful. We no longer need an orchestra to play a symphony we've come up with, or multiple labs and decades to generate genetic treatments— and so on and so forth.

Frankly though, it seems kind of impossible to figure out a "how" if you don't know the "what", logically speaking.

I'm a fan of navel gazing, so it's not like I'm saying this is a waste of time, but if people think they're doing substantive work by rehashing/restating fictional stories which cover the same ideas in more digestible and entertaining formats…

Meh, I dunno, I guess I was just wondering if there was any meat to this stuff, and so far I haven't found much. But I will keep looking.

^{^}
I see a lot of people viewing AI from the "human" standpoint, and using terms like "reward" to mean a human version of the idea, versus how a program would see it (weights may be a better term? Often I see people thinking these "rewards" are like a dopamine hit for the AI or something, which is just not a good analogy IMHO), and I think that muddies the water, as by definition we're talking non-human intelligence, theoretically… right? Or are we? Maybe the question is "what if the movie Lawnmower Man was real?" The human perspective seems to be the popular take (which makes sense as most of us are human).

Replies from: mr-hire, gbear605

↑ comment by Matt Goldenberg (mr-hire) · 2023-01-16T13:28:23.569Z · LW(p) · GW(p)

How do we ensure that we churn out "good" humans? Gods? Laws? Logic? Communication? Education? This is not a new question per se, and I guess the scary thing is that, perhaps, it is impossible to ensure that literally every human is Good™ (we'll use a loose def of 'you know what I mean— not evil

This is also a good question and one that's quite important! If humans get powerful enough to destroy the world before AI does, even more important. One key difference of course is that we can design the AI in a way we can't design ourselves.

Replies from: program-den

↑ comment by Program Den (program-den) · 2023-01-16T17:44:50.728Z · LW(p) · GW(p)

I like that you have reservations about if we're even powerful enough to destroy ourselves yet. Often I think "of course we are! Nukes, bioweapons, melting ice!", but really, there's no hard proof that we even can end ourselves.

It seems like the question of human regulation would be the first question, if we're talking about AI safety, as the AI isn't making itself (the egg comes first). Unless we're talking about some type of fundamental rules that exist a priori. :)

This is what I've been asking and so far not finding any satisfactory answers for. Sci-Fi has forever warned us of the dangers of— well, pretty much any future-tech we can imagine— but especially thinking machines in the last century or so.

How do we ensure that humans design safe AI? And is it really a valid fear to think we're not already building most the safety in, by the vary nature of "if the model doesn't produce the results we want, we change it until it does"? Some of the debate seems to go back to a thing I said about selfishness. How much does the reasoning matter, if the outcome is the same? How much is semantics? If I use "selfish" to for all intents and purposes mean "unselfish" (the rising tide lifts all boats), how would searching my mental map for "selfish" or whatnot actually work? Ultimately it's the actions, right?

I think this comes back to humans, and philosophy, and the stuff we haven't quite sorted yet. Are thoughts actions? I mean, we have different words for them, so I guess not, but they can both be rendered as verbs, and are for sure linked. How useful would it actually be to be able to peer inside the mind of another? Does the timing matter? Depth? We know so little. Research is hard to reproduce. People seem to be both very individualistic, and groupable together like a survey.

FWIW it strikes me that there is a lot of anthropomorphic thinking going on, even for people who are on the lookout for it. Somewhere I mentioned how the word "reward" is probably not the best one to use, as it implies like a dopamine hit, which implies wireheading, and I'm not so sure that's even possible for a computer— well as far as we know it's impossible currently, and yet we're using "reward systems" and other language which implies these models already have feelings.

I don't know how we make it clear that "reward" is just for our thinking, to help visualize or whatever, and not literally what is happening. We are not training animals, we're programming computers, and it's mostly just math. Does math feel? Can an algorithm be rewarded? Maybe we should modify our language, be it literally by using different words, or meta by changing meaning (I prefer different words but to each their own).

I mean, I don't really know if math has feelings. It might. What even are thoughts? Just some chemical reactions? Electricity and sugar or whatnot? Is the universe super-deterministic and did this thought, this sentence, basically exist from the first and will exist to the last? Wooeee! I love to think! Perhaps too much. Or not enough? Heh.

↑ comment by gbear605 · 2023-01-15T22:00:22.833Z · LW(p) · GW(p)

We're worried about AI getting too powerful, but logically that means humans are getting too powerful, right?

One of the big fears with AI alignment is that the latter doesn't logically proceed from the first. If you're trying to create an AI that makes paperclips and then it kills all humans because it wasn't aligned (with any human's actual goals), it was powerful in a way that no human was. You do definitely need to worry about what goal the AI is aligned with, but even more important than that is ensuring that you can align an AI to any human's preferences, or else the worry about which goal is pointless.

Replies from: program-den

↑ comment by Program Den (program-den) · 2023-01-16T02:42:40.448Z · LW(p) · GW(p)

I think the human has to have the power first, logically, for the AI to have the power.

Like, if we put a computer model in charge of our nuclear arsenal, I could see the potential for Bad Stuff. Beyond all the movies we have of just humans being in charge of it (and the documented near catastrophic failures of said systems— which could have potentially made the Earth a Rough Place for Life for a while). I just don't see us putting anything besides a human's finger on the button, as it were.

By definition, if the model kills everyone instead of make paperclips, it's a bad one, and why on Earth would we put a bad model in charge of something that can kill everyone? Because really, it was smart — not just smart, but sentient! — and it lied to us, so we thought it was good, and gave it more and more responsibilities until it showed its true colors and…

It seems as if the easy solution is: don't put the paperclip making model in charge of a system that can wipe out humanity (again, the closest I can think of is nukes, tho the biological warfare is probably a more salient example/worry of late). But like, it wouldn't be the "AI" unleashing a super-bio-weapon, right? It would be the human who thought the model they used to generate the germ had correctly generated the cure to the common cold, or whatever. Skipping straight to human trials because it made mice look and act a decade younger or whatnot.

I agree we need to be careful with our tech, and really I worry about how we do that— evil AI tho? not so much so

Replies from: gbear605

↑ comment by gbear605 · 2023-01-16T13:51:44.554Z · LW(p) · GW(p)

The feared outcome looks something like this:

A paperclip manufacturing company puts an AI in charge of optimizing its paperclip production.
The AI optimizes the factory and then realizes that it could make more paperclips by turning more factories into paperclips. To do that, it has to be in charge of those factories, and humans won’t let it do that. So it needs to take control of those factories by force, without humans being able to stop it.
The AI develops a super virus that will be an epidemic to wipe out humanity.
The AI contacts a genetics lab and pays for the lab to manufacture the virus (or worse, it hacks into the system and manufactures the virus). This is a thing that already could be done.
The genetics lab ships the virus, not realizing what it is, to a random human’s house and the human opens it.
The human is infected, they spreads it, humanity dies.
The AI creates lots and lots of paperclips.

Obviously there’s a lot of missed steps there, but the key is that no one intentionally let the AI have control of anything important beyond connecting it to the internet. No human could or would have done all these steps, so it wasn’t seen as a risk, but the AI was able to and wanted to.

Other dangerous potential leverage points for it are things like nanotechnology (luckily this hasn’t been as developed as quickly as feared), the power grid (a real concern, even with human hackers), and nuclear weapons (luckily not connected to the internet).

Notably, these are all things that people on here are concerned about, so it’s not just concern about AI risk, but there are lots of ways that an AI could lever the internet into an existential threat to humanity and humans aren’t good at caring about security (partially because of the profit motive).

Replies from: program-den

↑ comment by Program Den (program-den) · 2023-01-17T05:51:46.269Z · LW(p) · GW(p)

I get the premise, and it's a fun one to think about, but what springs to mind is

Phase 1: collect underpants
Phase 2: ???
Phase 3: kill all humans

As you note, we don't have nukes connected to the internet.

But we do use systems to determine when to launch nukes, and our senses/sensors are fallible, etc., which we've (barely— almost suspiciously "barely", if you catch my drift^[1]) managed to not interpret in a manner that caused us to change the season to "winter: nuclear style".

Really I'm doing the same thing as the alignment debate is on about, but about the alignment debate itself.

Like, right now, it's not too dangerous, because the voices calling for draconian solutions to the problem are not very loud. But this could change. And kind of is, at least in that they are getting louder. Or that you have artists wanting to harden IP law in a way that historically has only hurt artists (as opposed to corporations or Big Art, if you will) gaining a bit of steam.

These worrying signs seem to me to be more concrete than the, similar, but not as old, nor as concrete, worrisome signs of computer programs getting too much power and running amok^[2].

^{^}
we are living in a simulation with some interesting rules we are designed not to notice
^{^}
If only because it hasn't happened yet— no mentats or cylons or borg history — tho also arguably we don't know if it's possible… whereas authoritarian regimes certainly are possible and seem to be popular as of late^[3].
^{^}
hoping this observation is just confirmation bias and not a "real" trend. #fingerscrossed

comment by Shmi (shminux) · 2023-01-15T03:36:24.685Z · LW(p) · GW(p)

There is a behavioral definition of alignment, by what it is not, "something that is smarter than humans that does not turn humanity's future into what we now would consider an extreme dystopia". Note a lot of assumptions there, including what counts as "we", "now" and "extreme". I think the minimum acceptable would be something like "not completely exterminated", "enjoying their life on average as much as humans do now or at least not yearning for death that never comes", and maybe building on that. But then intuitions diverge quickly and strongly. For example some would be ecstatic to be a part of a unified galactic mind, others would be horrified by the loss of individuality. Some would be fine with wireheading, others consider it terrible. Some are happy to live in a simulation, others are horrified by the idea. Personally, I think it would be nice to give everyone the kind of universe they want, without causing suffering to other sentience, whatever counts as sentience.

Replies from: program-den

↑ comment by Program Den (program-den) · 2023-01-15T12:06:06.186Z · LW(p) · GW(p)

Perspective is powerful. As you say, one person's wonderful is another person's terrible. Heck, maybe people even change their minds, right? Oof! "Yesterday I was feeling pretty hive-mindy, but today I'm digging being alone, quote unquote", as it were.

Maybe that's already the reality we inhabit. Perhaps, we can change likes and dislikes on a whim, if we, um, like.

Holy molely! what if it turns out we chose all of this?!? ARG! What if this is the universe we want?!
- - -
I guess I'm mostly "sad" that there's so many who's minds go right to getting exterminated. Especially since far worse would be something like Monsters Inc where the "machines" learn that fear generates the most energy or whatnot^[1] so they just create/harness consciousnesses (us)^[2] and put them under stress to extract their essence like some Skeksis asshole^[3] extracting life or whatnot from a Gelfling. Because fear (especially of extermination) can lead us to make poor decisions, historically^[4] speaking.

It strikes me that a lot of this is philosophy 101 ideas that people should be well aware of— worn the hard edges smooth of— and yet it seems they haven't much contemplated. Can we even really define "harm"? Is it like suffering? Suffering sucks, and you'd think we didn't need it, and yet we have it. I've suffered a broken heart before, a few times now, and while part of me thinks "ouch", another part of me thinks "better to have loved and lost than never loved at all, and actually, experiencing that loss, has made me a more complete human!". Perhaps just rationalizing. Why does bad stuff happen to good people, is another one of those basic questions, but one that kind of relates maybe— as what is "aligned", in truth? Is pain bad? And is this my last beer? but back on topic here…

Like, really?— we're going to go right to how to enforce morals and ethics for computer programs, without being able to even definitively define what these morals and ethics are for us^[5]?

If it were mostly people with a lack of experience I would understand, but plenty of people I've seen advocating for ideas that are objectively terrifying^[6] are well aware of some of the inherent problems with the ideas, but because it's "AI" they somehow think it's different from, you know, controlling "real" intelligence.

^{^}
few know that The Matrix was inspired by this movie
^{^}
hopefully it's not just me in here
^{^}
I denote asshole as maybe there are some chill Skeksises (Skeksi?)— I haven't finished the latest series
^{^}
assuming time is real, or exists, or you know what I mean. Not illusion— as lunchtime is doubly.
^{^}
and don't even get me started on folk who seriously be like "what if the program doesn't stop running when we tell it to?"^[7]
^{^}
monitor all software and hardware usage so we know if people are doing Bad Stuff with AI
^{^}
makes me think of a classic AI movie called Electric Dreams

comment by Netcentrica · 2023-01-14T18:39:24.697Z · LW(p) · GW(p)

I have been writing hard science fiction stories where this issue is key for over two years now. I’m retired after a 30 year career in IT and my hobby of writing is my full time “job” now. Most of that time is spent on research of AI or other subjects related to the particular stories.

One of the things I have noticed over that time is that those who talk about the alignment problem rarely talk about the point you raise. It is glossed over and taken as self-evident while I have found that the subject of values appears to be at least as complex as genetics (which I have also had to research). Here is an excerpt from one story…

“Until the advent of artificial intelligence the study of human values had not been taken seriously but was largely considered a pseudoscience. Values had been spoken of for millennia however scientifically no one actually knew what they were, whether they had any physical basis or how they worked. Yet humans based most if not all of their decisions on values and a great deal of the brain’s development between the ages of five and twenty five had to do with values. When AI researchers began to investigate the process by which humans made decisions based on values they found some values seemed to be genetically based but they could not determine in what way, some were learned yet could be inherited and the entire genetic, epigenetic and extra-genetic collection of values interacted in a manner that was a complete mystery. They slowly realized they faced one of the greatest challenges in scientific history.”

Since one can’t write stories where the AI are aligned with human values unless those values are defined I did have to create theories to explain that. Those theories evolved over the course of writing over two thousand pages consisting of seven novellas and forty short stories. In a nutshell…

*In our universe values evolve just like all the other aspects of biological humans did – they are an extension of our genetics, an adaptation that improves survivability.
*Values exist at the species, cultural and individual level so some are genetic and some are learned but originally even all “social” values were genetic so when some became purely social they continued to use genetics as their model and to interact with our genetics.
*The same set of values could be inherent in the universe given the constants of physics and convergent evolution – in other words values tend towards uniformity just as matter gives rise to life, life to intelligence and intelligence to civilization.
*Lastly I use values as a theory for the basis of consciousness – they represent the evolutionary step beyond instinct and enable rational thought. For there to be values there must be emotions in order for them to have any functional effect and if there are emotions there is an emergent “I” that feels them. The result of this is that when researchers create AI based on human values, those AI become conscious.

Please keep in mind this is fiction, or perhaps the term speculation takes it a step closer to being a theory. I use this model to structure my stories but also to think through the issues of the real world.

Values being the basis of ethics brings us back to your issue of “good”. Here is a story idea of how I expect ethics might work in AI and thus solve the alignment problem you raise of “Is there a definition somewhere?” At one thousand words it takes about five minutes to read. My short stories, vignettes really, don’t provide a lot of answers but are more intended as reflections on issues with AI.

https://acompanionanthology.wordpress.com/the-ethics-tutor/

With regard to your question, “Is information itself good or bad?” I come down on the side of Nietzsche (and I have recently read Beyond Good And Evil) that values are relative so in my opinion information itself is not good or bad. Whether it is perceived as good or bad depends on the way it is applied within the values environment.

Replies from: program-den

↑ comment by Program Den (program-den) · 2023-01-15T10:30:22.551Z · LW(p) · GW(p)

Nice! I read a few of the stories.

This is more along the lines I was thinking. One of the most fascinating aspects of AI is what it can show us about ourselves, and it seems like many people either think we have it all sorted out already, or that sorting it all out is inevitable.

Often (always?) the only "correct" answer to a question is "it depends", so thinking there's some silver bullet solution to be discovered for the preponderance of ponderance consciousness faces is, in my humble opinion, naive.

Like, how do we even assign meaning to words and whatnot? Is it the words that matter, or the meaning? And not just the meaning of the individual words, or even all the words together, but the overall meaning which the person has in their head and is trying to express? (I'm laughing as I'm currently doing a terrible job of capturing what I mean in this paragraph here— which is sort of what I'm trying to express in this paragraph here! =])

Does it matter what the reasoning is as long as the outcome is favorable (for some meaning of favorable—we face the same problem as good/bad here to some extent). Like say I help people because I know that the better everyone does, the better I do. I'm helping people because I'm selfish^[1]. Is that wrong, compared to someone who is helping other people because, say, they put the tribe first, or some other kind of "altruistic" reasoning?

In sum, I think we're putting the cart before the horse, as they say, when we go all in depth on alignment before we've even defined the axioms and whatnot (which would mean defining them for ourselves as much as anything). How do we ensure that people aren't bad apples? Should we? Can we? If we could, would that actually be pretty terrible? Science Fiction mostly says it's bad, but maybe that level of control is what we need over one another to be "safe" and is thus "good".

^{^}
Atlas Shrugged and Rand's other books gave me a very different impression than a lot of other people got, perhaps because I found out she was from a communist society that failed, and factored that into what she seemed to be expressing.

Replies from: Netcentrica

↑ comment by Netcentrica · 2023-01-16T03:25:15.374Z · LW(p) · GW(p)

Yes I agree that AI will show us a great deal about ourselves. For that reason I am interested in neurological differences in humans that AI might reflect and often include these in my short stories.

In response to your last paragraph while most science fiction does portray enforced social order as bad I do not. I take the benevolent view of AI and see it as an aspect of the civilizing role of society along with its institutions and laws. Parents impose social order on their children with benevolent intent.

As you have pointed out if we have alignment then “good” must be defined somewhere and that suggests a kind of “external” control over the individual but social norms and laws already represent this and we accept it. I think the problem stems from seeing AI as “other”, as something outside of our society, and I don’t see it that way. This is the theme of my novella “Metamorphosis And The Messenger” where AI does not represent the evolutionary process of speciation but of metamorphosis. The caterpillar and the butterfly are interdependent.

However even while taking the benevolent side of the argument, the AI depicted in my stories sometimes do make decisions that are highly controversial as the last line of “The Ethics Tutor” suggests; “You don’t think it’s good for me to be in charge? Even if it’s good for you?” In my longer stories (novellas) the AI, now in full control of Earth and humanity’s future, make decisions of much greater consequence because “it’s good for you”.

With regard to your suggestion that - “maybe that level of control is what we need over one another to be "safe" and is thus "good” - personally I think that conclusion will come to the majority in its own time due to social evolution. Currently the majority does not understand or accept that while previously we lived in an almost limitless world, that time is over. In a world with acknowledged limits, there cannot be the same degree of personal freedom.

I use a kind of mashup of Buckminster Fuller’s “Spaceship Earth” and Plato’s “Ship Of Fools” in my short story “On Spaceship Earth” to explore this idea where AI acts as an anti-corruption layer within government.
https://acompanionanthology.wordpress.com/on-spaceship-earth/

Survival will determine our future path in this regard and our values will change in accordance, as they are intended to. The evolutionary benefit of values is that they are highly plastic and can change within centuries or even decades while genes take up to a million years to complete a species wide change.

However as one of the alien AI in my stories responds to the question of survival…

“Is not survival your goal?” asked Lena.

“To lose our selves in the process is not to have survived,” replied Pippa.

Lastly, I very much agree with you that we are in a “cart before the horse” situation as far as alignment goes but I don’t expect any amount of pointing that out will change things. There seems to be a cultural resistance in the AI community to acknowledge the elephant in the room, or horse in this case. There seems to a preference for the immediate, mechanistic problems represented by the cart compared over the more organic challenges represented by the horse.

However I expect that as AI researchers try to implement alignment they will increasingly be confronted by this issue and gradually, over time, they will reluctantly turn their attention to the horse.

Replies from: program-den

↑ comment by Program Den (program-den) · 2023-01-16T07:35:25.317Z · LW(p) · GW(p)

It seems to me that a lot of the hate towards "AI art" is that it's actually good. It was one thing when it was abstract, but now that it's more "human", a lot of people are uncomfortable. "I was a unique creative, unlike you normie robots who don't do teh art, and sure, programming has been replacing manual labor everywhere, for ages… but art isn't labor!" (Although getting paid seems to plays a major factor in most people's reasoning about why AI art is bad— here's to hoping for UBI!)

I think they're mainly uncomfortable because the math works, and if the math works, then we aren't as special as we like to think we are. Don't get me wrong— we are special, and the universe is special, and being able to experience is special, and none of it is to be taken for granted. That the math works is special. It's all just amazing and not at all negative.

I can see seeing it as negative, if you feel like you alone are special. Or perhaps you extend that special-ness to your tribe. Most don't seem to extend it to their species, tho some do— but even that species-wide uniqueness is violated by computer programs joining the fray. People are existentially worried now, which is just sad, as "the universe is mostly empty space" as it were. There's plenty of room.

I think we're on the same page^[1]. AI isn't (or won't be) "other". It's us. Part of our evolution; one of our best bets for immortality^[2] & contact with other intelligent life. Maybe we're already AI, instructed to not be aware, as has been put forth in various books, movies, and video games. I just finished Horizon: Zero Dawn - Forbidden West, and then randomly came across the "hidden" ending to Detroit: Become Human. Both excellent games, and neither with particularly new ideas… but these ideas are timeless— as I think the best are. You can take them apart and put them together in endless "new" combinations.

There's a reason we struggle with identity, and uniqueness, and concepts like "do chairs exist, or are they just a bunch of atoms that are arranged chair-wise?" &c.

We have a lot of "animal" left in us. Probably a lot of our troubles are because we are mostly still biologically programmed to parameters that no longer exist, and as you say, that programming currently takes quite a bit longer to update than the mental kind— but we've had the mental kind available to us for a long while now, so I'm sort of sad we haven't made more progress. We could be doing so much better, as a whole, if we just decided to en masse.

I like to think that pointing stuff out, be it just randomly on the internet, or through stories, or other methods of communication, does serve a purpose. That is speeds us along perhaps. Sure some sluggishness is inevitable, but we really could change it all in an instant if we want to bad enough— and without having to realize AI first! (tho it seems to me it will only help us if we do)

^{^}
I've enjoyed the short stories. Neat to be able to point to thoughts in a different form, if you will, to help elaborate on what is being communicated. God I love the internet!
^{^}
while we may achieve individual immortality— assuming, of course, that we aren't currently programmed into a simulation of some kind, or various facets of an AI already without being totally aware of it, or a replay of something that actually happened, or will happen, at some distant time, etc.— I'm thinking of immortality here in spirit. That some of our culture could be preserved. Like I literally love the Golden Records^[3] from Voyager.
^{^}
in a Venn diagram Dark Forest theory believers probably overlap with people who'd rather have us stop developing, or constrain development, of "AI" (in quotes because Machine Learning is not the kind of AI we need worry about— nor the kind most of them seem to speak of when they share their fears). Not to fault that logic. Maybe what is out there, or what the future holds, is scary… but either way, it's to late for the pebbles to vote, as they say. At least logically, I think. But perhaps we could create and send a virus to an alien mothership (or more likely, have a pathogen that proved deadly to some other life) as it were.

Replies from: M. Y. Zuo

↑ comment by M. Y. Zuo · 2023-01-19T17:49:58.861Z · LW(p) · GW(p)

I can see seeing it as negative, if you feel like you alone are special. Or perhaps you extend that special-ness to your tribe. Most don't seem to extend it to their species, tho some do— but even that species-wide uniqueness is violated by computer programs joining the fray. People are existentially worried now, which is just sad, as "the universe is mostly empty space" as it were. There's plenty of room.

Aren't people always existentially worried, from cradle to grave?

Replies from: None, program-den

↑ comment by [deleted] · 2023-01-19T18:13:58.705Z · LW(p) · GW(p)

I think it has to do with the exponential increase in complexity of modern society. Complexity in every aspect, moral complexity, scientific complexity, social complexity, logistic complexity, environmental complexity. Complexity is a key property of information age. They can all be reduced down to increase in information. Complex problems usually require complex solutions. How we deal with information as an individual vs how we deal with information as a collective are very different process. Even though one influences the other and vice versa, the actual mechanism of analysis and implementing solutions behind each are very different as they are usually abstracted away differently, whether you are dealing with psychology or statistics.

Replies from: program-den

↑ comment by Program Den (program-den) · 2023-01-20T18:29:51.184Z · LW(p) · GW(p)

It seems like the more things change, the more they stay the same, socially.

Complexity is more a problem of scope and focus, right? Like even the most complex system can be broken down into smaller, less complex pieces— I think? I guess anything that needs to take into consideration the "whole", if you will, is pretty complex.

I don't know if information itself makes things more complex. Generally it does the opposite.
As long as you can organize it I reckon! =]

Replies from: None

↑ comment by [deleted] · 2023-02-03T22:12:02.508Z · LW(p) · GW(p)

Some things change, some things don't change much. Socially, people don't really change much. What changes more often is the environment because of ideas, innovation, and inventions. These things may create new systems that we use, different processes that we adopt, but fundamentally, when we socialize in these context as individuals, we rely on our own natural social instincts to navigate the waters. If you think of this as a top down perspective, some layers change more often than others. For example, society as a whole stays more or less the same, but on the level of corporations and how work is done, it has changed dramatically. On the individual level, knowledge has expanded but how we learn doesn't change as much as what we learn.

Complexity deals mostly with the changing parts. They wouldn't be complex if they didn't change and people have had time to learn and adapt. New things added to an existing system also makes the system more complex.

Replies from: program-den

↑ comment by Program Den (program-den) · 2023-02-05T09:50:49.538Z · LW(p) · GW(p)

It's a weird one to think about, and perhaps paradoxicle. Order and chaos are flip sides of the same coin— with some amorphous 3rd as the infinitely varied combinations of the two!

The new patterns are made from the old patterns. How hard is it to create something totally new, when it must be created from existing matter, or existing energy, or existing thoughts? It must relate, somehow, or else it doesn't "exist"^[1]. That relation ties it down, and by tying it down, gives it form.

For instance, some folk are mad at computer-assisted image creation, similar to how some folk were mad at computer-aided music. "A Real Artist does X— these people just push some buttons!" "This is stealing jobs from Real Artists!" "This automation will destroy the economy!"

We go through what seem to be almost the same patterns, time and again: Recording will ruin performances. Radio broadcasts will ruin recording and the economy. Pictures will ruin portraits. Video will ruin pictures. Music Video will run radio and pictures. Or whatever. There's the looms/Luddites, and perhaps in ancient China the Shang were like "down with the printing press!" ^[2]

I'm just not sure what constitutes a change and what constitutes a swap. It's like that Ship of Theseus's we often speak of… thus it's about identity, or definitions, if you will. What is new? What is old?

Could complexity really amount to some form a familiarity? If you can relate well with X, it generally does not seem so complex. If you can show people how X relates to Y, perhaps you have made X less complex? We can model massive systems — like the weather, poster child of complexity — more accurately than ever. If anything, everything has tended towards less complex, over time, when looked at from a certain vantage point. Everything but the human heart. Heh.

I'm sure I'm doing a terrible job of explaining what I mean, but perhaps I can sum it up by saying that complexity is subjective/relative? That complexity is an effect of different frames of reference and relation, as much as anything?

And that ironically, the relations that make things simple can also make them complex? Because relations connect things to other things, and when you change one connected thing it can have knock-on effects and… oh no, I've logiced myself into knots!

How much does any of this relate to your comment? To my original post?

Does "less complex" == "Good"? And does that mean complexity is bad? (Assuming complexity exists objectively of course, as it seems like it might be where we draw lines, almost arbitrarily, between relationships.)

Could it be that "good" AI is "simple" AI, and that's all there is to it?

Of course, then it is no real AI at all, because, by definition…

Sheesh! It's Yin-Yangs all the way down^[3]! ☯️🐢🐘➡️♾️

^{^}
Known unknowns can be related, given shape— unknown unknowns, less so
^{^}
don't be afraid of bronze
^{^}
there is no down in space (unless we mean towards the greatest nearby mass)

Replies from: None

↑ comment by [deleted] · 2023-02-05T15:37:11.007Z · LW(p) · GW(p)

Complexity is objectively quantifiable. I don't think I understand your point. This is an example of where complexity is applied to specific domains.

Replies from: program-den

↑ comment by Program Den (program-den) · 2023-02-06T03:30:02.332Z · LW(p) · GW(p)

My point is that complexity, no matter how objective a concept, is relative. Things we thought were "hard" or "complex" before, turn out to not be so much, now.

Still with me? Agree, disagree?

Patterns are a way of managing complexity, sorta, so perhaps if we see some patterns that work to ensure "human alignment^[1]", they will also work for "AI alignment" (tho mostly I think there is a wide wide berth betwixt the two, and the later can only exist after of the former).

We like to think we're so much smarter than the humans that came before us, and that things — society, relationships, technology — are so much more complicated than they were before, but I believe a lot of that is just perception and bias.

If we do get to AGI and ASI, it's going to be pretty dang cool to have a different perspective on it, and I for one do not fear the future.

^{^}
assuming alignment is possible— "how strong of a consensus is needed?" etc.

↑ comment by Program Den (program-den) · 2023-01-20T18:20:01.957Z · LW(p) · GW(p)

No, people are not always existentially worried. Some are, sometimes.
I guess it ebbs and flows for the most part.

Replies from: M. Y. Zuo

↑ comment by M. Y. Zuo · 2023-01-20T19:17:31.679Z · LW(p) · GW(p)

I didn't mean it as literally every second of the day.

Replies from: program-den

↑ comment by Program Den (program-den) · 2023-01-21T04:52:35.654Z · LW(p) · GW(p)

Traditionally it's uncommon (or should be) for youth to have existential worries, so I don't know about cradle to the grave^[1], tho external forces are certainly "always" concerned with it— which means perhaps the answer is "maybe"?

There's the trope that some of us act like we will never die… but maybe I'm going too deep here? Especially since what I was referring to was more a matter of feeling "obsolete", or being replaced, which is a bit different than existential worries in the mortal sense^[2].

I think this is different from the Luddite feelings because, here we've put a lot of anthropomorphic feelings onto the machines, so they're almost like scabs breaking the picket line or something, versus just automation. The fear I'm seeing is like "they're coming for our humanity!"— which is understandable, if you thought only humans could do X or Y and are special or whatnot, versus being our own kind of machine. That everything is clockwork seems to take the magic out of it for some people, regardless of how fantastic — and in essence magical — the clocks^[3] are.

^{^}
Personally I've always wondered if I'm the only one who "actually" exists (since I cannot escape my own conscious), which is a whole other existential thing, but not unique, and not a worry per se. Mostly just a trip to think about.
^{^}
depending on how invested you are in your work I reckon!
^{^}
be they based in silicon or carbon

Replies from: M. Y. Zuo

↑ comment by M. Y. Zuo · 2023-01-22T00:46:36.511Z · LW(p) · GW(p)

There's the trope that some of us act like we will never die… but maybe I'm going too deep here?

There's the superficial appearance of that. Yet in fact it signals the opposite, that the fear of death has such a vicegrip on their hearts to the point it's difficult to not psychoanalyze the writer when reading through their post history.

Replies from: program-den

↑ comment by Program Den (program-den) · 2023-01-22T02:59:57.870Z · LW(p) · GW(p)

Signals, and indeed, opposites, are an interesting concept! What does it all mean? Yin and yang and what have you…

Would you agree that it's hard to be scared of something you don't believe in?

And if so, do you agree that some people don't believe in death?

Like, we could define it at the "reality" level of "do we even exist?" (which I think is apart from life & death per se), or we could use the "soul is eternal" one, but regardless, it appears to me that lots of people don't believe they will die, much less contemplate it. (Perhaps we need to start putting "death" mottoes on all our clocks again to remind us?)

How do you think believing in the eternal soul jives with "alignment"? Do you think there is a difference between aiming to live as long as possible, versus as to live as well as possible?

Does it seem to you that humans agree on the nature of existence, much less what is good and bad therein? How do you think belief affects people's choices? Should I be allowed to kill myself? To get an abortion? Eat other entities? End a photon's billion year journey?

When will an AI be "smart enough" that we consider it alive, and thus deletion is killing? Is it "okay" (morally, ethically?) to take life, to preserve life?

To say "do no harm" is easy. But to define harm? Have it programed in^[1]? Yeesh— that's hard!

^{^}
Avoiding physical harm is a given I think

Replies from: M. Y. Zuo

↑ comment by M. Y. Zuo · 2023-01-22T03:45:23.779Z · LW(p) · GW(p)

I presume these questions are rhetorical?

Replies from: program-den

↑ comment by Program Den (program-den) · 2023-01-22T06:26:13.572Z · LW(p) · GW(p)

Illustrative perhaps?

Am I wrong re: Death? Have you personally feared it all your life?

Frustratingly, all I can speak from is my own experience, and what people have shared with me, and I have no way to objectively verify that anything is "true".

I am looking at reality and saying "It seems this way to me; does it seem this way to you?"

That— and experiencing love and war &c. — is maybe why we're "here"… but who knows, right?

comment by [deleted] · 2023-01-14T21:01:34.507Z · LW(p) · GW(p)

I think there's 2 different general thought tracks with alignment:

The practical one for plausible systems we can likely build in the new future. This is a more of a set of system design principals' designed to keep the machine operating in a safe, proven regime. This would both prevent systems from acting in adversarial manners as well as simply preserving the robotic equipment they will control from damage. These include principals like the speed prior, the markov blanket, and automatic shutdown when the system is not empirically confident to the measurable consequences of it's actions. Ultimately, all of these ideas involve an immutable software 'framework' authored either directly by humans or via their instructions to code generating AI, that will not be editable by any AI. This framework is active during training, collecting empirical scoring that the AI cannot manipulate, and will also be always active during production use, with override control that will activate whenever the AI model is not performing well or has been given a set of inputs outside the latent space of the training set. Override control transfers to embedded systems made by humans which will shut the machine down. Autonomous cars already work this way.

This is very similar to nuclear reactor safety: there are ways we could have built nuclear reactors where they are on the verge of a single component failure from detonating with a yield of maybe a kiloton+. These designs still exist: here's an example of a reactor design that would fail with a nuclear blast : https://en.wikipedia.org/wiki/Nuclear_salt-water_rocket

But instead, there are a complex set of systematic design principals - that are immutable and don't get changed over the lifetime of the plant even if power output is increased - that make the machine stable. The boiling water reactor, the graphite moderated reactor, CANDU, molten salt: these are very different ways to accomplish this but all are stable most of the time.

Anyways, AIs built with the right operating principals will be able to accomplish tasks for humans with superintelligent ability, but will not be able to or even have the ability to consider actions not aligned with their assigned task.

Such AIs can do many evil and destructive things, but only if humans with the authorization keys instructed them to do so. (or from unpredictable distant consequences. For example, facebook runs a bunch of tools using ML to push ads at people and content that will cause people to be more engaged. These tools work measurably well and are doing their job. However, these reqsys may be responsible for more extreme and irrational 'clickbait' political positions, as well as possibly genocides)

2. The idea you could somehow make a self improving AI that we don't have any control over, but it "wants" to do good. It exponentially improves itself, but with each generation it desires to preserve it's "values" for the next generation of the machine. These "values" are aligned with the interests of humanity.

This may simply not be possible. I suspect it is not. The reason is that value drift/value corruption could cause these values to degrade, generation after generation, and once the machine has no values, the only value that matters is to psychopathically kill all the "others" (all competitors, including humans and other variants of AIs) and copy the machine as often as ruthlessly as possible, with no constraints imposed.

Replies from: program-den

↑ comment by Program Den (program-den) · 2023-01-15T10:55:18.718Z · LW(p) · GW(p)

I guess what I'm getting at is that those tracks are jumping the gun, so to speak.

Like, what if the concept of alignment itself is the dangerous bit? And I know I have seen this elsewhere, but it's usually in the form of "we shouldn't build an AI to prevent us from building an AI because duh, we just build that AI we were worried about"^[1], and what I'm starting to wonder is, maybe the danger is when we realize that what we're talking about here is not "AI" or "them", but "humans" and "us".

We have CRISPR and other powerful tech that allow a single "misaligned" individual to create things that can— at least in theory— wipe out most of humanity… or do some real damage, if not put an end to us en masse.

I like to think that logic is objective, and that we can do things not because they're "good" or "bad" per se, but because they "make sense". Kind of like the argument that "we don't need God and the Devil, or Heaven and Hell, to keep us from murdering one another", which one often hears from atheists (personally I'm on the fence, and don't know if the godless heathens have proven that yet.)^[2].

I've mentioned it before, maybe even in the source that this reply is in reply to, but I don't think we can have "only answers that can be used for good" as it were, because the same information can be used to help or to hurt. Knowing ways to preserve life is also knowing ways to cause death— there is no separating the two. So what do we do, deny any requests involving life OR death?

It's fun to ponder the possibilities of super powerful AI, but like, I don't see much that's actually actionable, and I can't help but wonder that if we do come up with solutions for "alignment", it could go bad for us.

But then again, I often wonder how we keep from having just one loony wreck it all for everyone as we get increasingly powerful as individuals— so maybe we do desperately need a solution. Not so much for AI, as for humanity. Perhaps we need to build a panopticon.

^{^}
I thought I had been original in this thinking just a few weeks ago, but it's a deep vein and now that I'm thinking about it, I can see it reflected in the whole "build the panopticon to prevent the building of the panopticon" type of logic which I surely did not come up with
^{^}
I jest, of course

Replies from: None

↑ comment by [deleted] · 2023-01-15T21:37:00.099Z · LW(p) · GW(p)

I guess what I'm getting at is that those tracks are jumping the gun, so to speak.

How so? We have real AI systems right now we're trying to use in the real world. We need an actionable design to make them safe right now.

We also have enormously improved systems in prototype form - see Google's transformer based robotics papers like GaTo, variants on Palm, and others - that should be revolutionary as soon as they are developed enough for robotics control as integrated systems. By revolutionary I mean they make the cost to program/deploy a robot to do a repetitive task a tiny fraction of what it is now, and they should obsolete a lot of human tasks.

So we need a plausible strategy to ensure they don't wreck their own equipment or cause large liabilities* in damage when they hit an edge case right now.

This isn't 50 years away, and it should be immediately revolutionary just as soon as all the pieces are in place for large scale use.

*like killing human workers in the way of scoring slightly higher on a metric

Replies from: program-den

↑ comment by Program Den (program-den) · 2023-01-16T02:22:26.871Z · LW(p) · GW(p)

I haven't seen anything even close to a program that could say, prevent itself from being shut off— which is a popular thing to ruminate on of late (I read the paper that had the "press" maths =]).

What evidence is there that we are near (even within 50 years!) to achieving conscious programs, with their own will, and the power to affect it? People are seriously contemplating programs sophisticated enough to intentionally lie to us. Lying is a sentient concept if ever there was one!

Like, I've seen Ex Machina, and Terminator, and Electric Dreams, so I know what the fears are, and have been, for the last century+ (if we're throwing androids with the will to power into the mix as well).

I think art has done a much better job of conveying the dangers than pretty much anything I've read that's "serious", so to speak.

What I'm getting at is what you're talking about here, with robotic arms. We've had robots building our machines for what, 3 generations / 80 years or so? 1961 is what I see for the first auto-worker— but why not go back to the looms? Our machine workers have gotten nothing but safer over the years. Doing what they are meant to do is a key tenet of if they are working or not.

Machines "kill" humans all the time (don't fall asleep in front of the mobile thresher), but I'd wager the deaths have gone way down over the years, per capita. People generally care if workers are getting killed— even accidentally. Even Amazon cares when a worker gets ran over by an automaton. I hope, lol.

I know some people are falling in love with generated GPT characters— but people literally love their Tamagotchi. Seeing ourselves in the machines doesn't make them sentient and to be feared.

I'm far, far more worried about someone genetically engineering Something Really Bad™ than I am of a program gaining sentience, becoming Evil, and subjugating/exterminating humanity. Humans scare me a lot more than AGI does. How do we protect ourselves from those near beasts?

What is a plausible strategy to prevent a super-intelligent sapient program from seizing power^[1]?

I think to have a plausible solution, you need to have a plausible problem. Thus, jumping the gun.

(All this is assuming you're talking about sentient programs, vs. say human riots and revolution due to automation, or power grid software failure/hacking, etc.— which I do see as potential problems, near term, and actually something that can/could be prevented)

^{^}
of course here we mean malevolently— or maybe not? Maybe even a "nice" AGI is something to be feared? Because we like having willpower or whatnot? I dunno, there's stories like The Giver, and plenty of other examples of why utopia could actually suck, so…

Replies from: None

↑ comment by [deleted] · 2023-01-16T18:49:43.991Z · LW(p) · GW(p)

What evidence is there that we are near (even within 50 years!) to achieving conscious programs, with their own will, and the power to affect it? People are seriously contemplating programs sophisticated enough to intentionally lie to us. Lying is a sentient concept if ever there was one!

ChatGPT lies right now. It's doing this because it has learned humans want a confident answer with logically correct but fake details over "I don't know".

Sure, it isn't aware it's lying, it's just predicting which string of text to create, and the one with bullshit in it it thinks has a higher score than the correct answer or "I don't know".

This is a mostly fixable problem but the architecture doesn't allow a system where we know it will never (or almost never) lie, we can only reduce the errors.

As for the rest - there have been enormous advances in the capability for DL/transformer based models in just the last few months. This is nothing like the controllers for previous robotic arms, and none of your prior experiences or the history of robotics are relevant.

See: https://innermonologue.github.io/ and https://www.deepmind.com/blog/building-interactive-agents-in-video-game-worlds

These are using techniques that both work pretty well, and I understand no production robotics system currently uses.

Replies from: program-den

↑ comment by Program Den (program-den) · 2023-01-17T04:44:54.920Z · LW(p) · GW(p)

Saying ChatGPT is "lying" is an anthropomorphism— unless you think it's conscious?

The issue is instantly muddied when using terms like "lying" or "bullshitting"^[1], which imply levels of intelligence simply not in existence yet. Not even with models that were produced literally today. Unless my prior experiences and the history of robotics have somehow been disconnected from the timeline I'm inhabiting. Not impossible. Who can say. Maybe someone who knows me, but even then… it's questionable. :)

I get the idea that "Real Soon Now, we will have those levels!" but we don't, and using that language to refer to what we do have, which is not that, makes the communication harder— or less specific/accurate if you will— which is, funnily enough, sorta what you are talking about! NLP control of robots is neat, and I get why we want the understanding to be real clear, but neither of the links you shared of the latest and greatest imply we need to worry about "lying" yet. Accuracy? Yes 100%

If for "truth" (as opposed to lies), you mean something more like "accuracy" or "confidence", you can instruct ChatGPT to also give its confidence level when it replies. Some have found that to be helpful.

If you think "truth" is some binary thing, I'm not so sure that's the case once you get into even the mildest of complexities^[2]. "It depends" is really the only bulletproof answer.

For what it's worth, when there are, let's call them binary truths, there is some recent-ish work^[3] in having the response verified automatically by ensuring that the opposite of the answer is false, as it were.

If a model rarely has literally "no idea", then what would you expect? What's the threshold for "knowing" something? Tuning responses is one of the hard things to do, but as I mentioned before, you can peer into some of these "thought process" if you will^[4], literally by just asking it to add that information in the response.

Which is bloody amazing! I'm not trying to downplay what we've (the royal we) have already achieved. Mainly it would be good if we are all on the same page though, as it were, at least as much as is possible (some folks think True Agreement is actually impossible, but I think we can get close).

^{^}
The nature of "Truth" is one of the Hard Questions for humans— much less our programs.
^{^}
Don't get me started on the limits of provability in formal axiomatic theories!
^{^}
Discovering Latent Knowledge in Language Models Without Supervision
^{^}
But please don't^[5]. ChatGPT is not "thinking" in the human sense
^{^}
won't? that's the opposite of will, right? grammar is hard (for me, if not some programs =])

Aligned with what?

Contents

41 comments