Posts
Comments
Nice, thanks for the feedback! Absolutely, for me it was more of a stream of consciousness, just to get it out of my system, so I'll work on refining it soon! It's really fascinating which overlaps AI alignment has with mental illnesses in humans :)
Oh wow, I didn't even know about that! I had always only met EA people in real life (who always suggested to me to participate in the EA forums), but didn't know about this program. Thanks so much for the hint, I'll apply immediately!
Exactly! And if we can make AI earn money autonomously instead of greedy humans, then it can give all of it to philanthropy (including more AI alignment research)!
And of course! I've been trying to post in the EA forums repeatedly, but even though my goals are obviously altruistic, I feel like I'm just expressing myself badly. My posts there were always just downvoted, and I honestly don't know why, because no one there is ever giving me good feedback. So I feel like EA should be my home turf, but I don't know how to make people engaged. I know that I have many unconventional approaches of formulating things, and looking back, maybe some of them were a bit "out there" initially. But I'm just trying to make clear to people that I'm thinking with you, not against you, but somehow I'm really failing at that 😅
Oh you need to look at the full presentation :) The way how this is approaching alignment is that the profits don't go into my own pocket, but instead into philanthropy. That's the point of this entire endeavor, because we as the (at least subjectively) "more responsible" people see the inevitability of AI-run businesses, but channel the profits into the common good instead.
Just look at this ChatGPT output. Doesn't this make you concerned? https://chatgpt.com/share/67a7bc09-6744-8003-b620-d404251e0c1d
No, it's not hard. Because making business is not really hard.
OpenAI is just fooling us with believing that powerful AI costs a lot of money because they want to maximize shareholder value. They don't have any interest in telling us the truth, namely that with the LLMs that already exist, it'll be very cheap.
As mentioned, the point is that AI can run its own businesses. It can literally earn money on its own. And all it takes is a few well-written emails and very basic business-making and sales skills.
Then it earns more and more money, buys existing businesses and creates monopolies. It just does what every ordinary businessman would do, but on steroids. And just like any basic businessman, it doesn't take much: Instead of cocaine, it has a GPU where it runs its inference. And instead of writing just a single intimidating, manipulative email per hour, it writes thousands per second, easily destroying every kind of competition within days.
This doesn't take big engineering. It just takes a bit of training on the most ruthless sales books, some manipulative rhetorics mixed in and API access to a bank account and eGovernment in a country like Estonia, where you can form a business with a few mouse clicks.
Powerful AI will not be powerful because it'll be smart, it'll be powerful because it'll be rich. And getting rich doesn't require being smart, as we all know.
I'm not saying that I know how to do it well.
I just see it as a technological fact that it is very possible to build an AI which exerts economic dominance by just assembling existing puzzle pieces. With just a little bit of development effort, AI will be able to run an entire business, make money and then do stuff with that money. And this AI can then easily spiral into becoming autonomous and then god knows what it'll do with all the money (i.e. power) it will then have.
Be realistic: Shutting down all AI research will never happen. You can advocate for it as much as you want, but Pandora's Box has been opened. We don't have time to wait until "humanity figures out alignment", because by then we'll all be enslaved by AGI. If we don't make the first step in building it, someone else will.
Well, I'd say that each individual has to make this judgement by themselves. No human is objectively good or bad, because we can't look into each others heads.
I know that we may also die even if the first people building super-AIs are the most ethical organization on Earth. But if we, as part of the people who want to have ethical AI, don't start with building it immediately, those that are the exact opposite of ethical will do it first. And then our probability of dying is even larger.
So why this all-or-nothing mentality? What about reducing the chances of dying through AGI by building it first, because otherwise others who are much less aware of AI alignment stuff will build it first (e.g. Elon, Kim and the likes)?
Newbie here! After some enlightening conversations in the comment section here, I finally understood the point of AI alignment; sorry that it took me so long. See https://blog.hermesloom.org/p/when-ai-becomes-the-ultimate-capitalist for my related ramblings, but that's not relevant now.
Bottom line of my hypothesis is: A necessary precondition for AGI will be financial literacy first and then economic dominance, i.e. the AI must be able to earn its own money it could then use to exercise power. And obviously, if the wrong people build this kind of system first, we might be pretty fucked, because we have no idea what they (or their potentially misaligned autonomous AI) will do with all that money.
So let's do it first, before the evil guys do it, but let's do it well from the start! With the help of ChatGPT, I verbalized these ideas in a pitch deck you can find at https://docs.google.com/presentation/d/1TptptLM59yrQsF7SmrbPnayZaXlTEZgNlpuYl3HpoLk/edit
It's actually two pitch decks, "The Capitalist Agent" and "The Philanthropic Agent". They might seem like opposites at first, but they are actually complementary, like yin and yang. And due to my personal professional experience in building startups and robotic process automation, this actually seems pretty doable from the technical side. Would be so curious about feedback or even getting to know collaborators!
Yep, 100% agree with you. I had read so much about AI alignment before, but to me it has always only been really abstract jargon -- I just didn't understand why it was even a topic, why it is even relevant, because, to be honest, in my naive thinking it all just seemed like an excessively academic thing, where smart people just want to make the population feel scared so that their research institution gets the next big grant and they don't need to care about real-life problems. Thanks to you, now I'm finally getting it, thank you so much again!
At the same time, while I fully understand the "abstract" danger now, I'm still trying to understand the transition you're making from "envisioning AI smart enough to run a company better than a human" to "eventually outcompeting humans if it wanted to".
The way how I initially thought about this "Capitalist Agent" was as a purely procedural piece of software. That is, it breaks down its main goal (in this case: earning money) into manageable sub-goals, until each of these sub-goals can be solved through either standard computing methods or some generative AI integration.
As an example, I might say to my hypothetical "Capitalist Agent": "Earn me a million dollars by selling books of my poetry". I would then give it access to a bank account (through some sort of read-write Open Banking API) as well as the PDFs of my poetry to be published. Then the first thing it might do is to found a legal entity (a limited liability company), for which it might first search for a respective advisor on Google, send that advisor automatically generated emails with my business idea or it might even take the "computer use" approach in case my local government is already digitized enough and fill out the respective eGovernment forms online automatically. And then later it would do something similar by automatically "realizing" that it needs to make deals with publishing houses, with printing facilities etc. Essentially just basic Robotic Process Automation on steroids. Everything a human could do on a computer, this software could do as well.
But: It would still need to obey the regular laws of economics, i.e. it couldn't create money out of thin air to fulfill its tasks. Pretty much anything it would "want to do" in the real world costs money.
So in the next step, let's assume that, after I have gotten rich with my poetry, the next instruction I give this agentic "AGI" is: "Please, dear AGI, please now murder all of humanity."
Then it thinks through all the steps (i.e. procedurally breaking the big task down into chunks which can be executed by a computer) and eventually it can say with absolute certainty: "Okay Julian, your wish is my command."
Obviously the first thing it would do is to create the largest, most profitable commercial company in the world, to initially play by the rules of capitalism until it has accumulated so much money that it can take over (i.e. simply "buy out") an entire existing government which then already has nuclear missiles, at least that would be the "most efficient" and fail-safe approach I would see. Its final action would be to press the red button, which would exterminate all of humanity. Success!
But the thing is: No one will know until it's too late. Obviously, me as Mr. Evil, I wouldn't tell anyone about the fact that in my business making, I am actually led by an AI/AGI. I would appear on the cover of Forbes, Fortune and whatever and eventually I would be the richest person in the world, and everyone would pat me on the shoulder because of my "visionary thinking" and my "innovative style of making business", because everyone would believe that I am the sole decision maker in that company. The AI would stage it to look like as if I would be a big philanthropist, "saving" humanity from nuclear weapons. The AI would make sure that it always stays behind the scenes, that no one except for me will ever even know about its existence. I would be a wolf in sheep skin until the very last moment and no one could stop me, because everyone is fooled by me.
Even though there's no rational reasoning for why I even want to kill humanity, it is really easily possible for any human to develop that "ideé fixe".
In a way, I am actually a lot more afraid of my scenario. And that's exactly why I wrote this blog post about "The Capitalist Agent" and why I'm criticizing ongoing AI alignment research: Of course, hypothetically AI could turn itself against humanity completely autonomously. But at the end of the day, there would still need to be a human "midwiving" that AGI and who would allow the AI to interact/interface with especially the monetary and financial system, for that software to be able to do anything in the "real world", really.
Right now (at least that's the vibe in the industry right now) one of the most "acceptable" uses for AI is to automate business processes, to automate customer interactions (e.g. in customer support), etc. But if you extrapolate that, you get the puzzle pieces to run every part of a business in a semi-automated and eventually fully automated fashion (that's what I mean by "Capitalist Agent"). This then means that no outside observer can distinguish anymore whether a business owner is led on by AI or not, because no business owner will honestly tell you. And for every one of them, they can always say "but I'm just earning so much money to become a philanthropist" later and they always have plausible deniability. Until they have accumulated so much money through this automated, AI-run business, which they can then use for very big evil very quickly. It's just impossible to know beforehand, because you're unable to learn the "true motivation" in any human's head.
The only thing that you as AI alignment researchers will eventually be confronted with is AIs being fixated on the idea to earn as much money as possible, because money means power, and only with power you can cause violence. But it's simply impossible for you to know what the person for whom the AI is earning all this money actually wants to do with that money in the future.
The main value which you, as AI alignment researchers, will need to answer is: "Is it moral and aligned with societal values if any AI-based system is earning money for an individual or a small group of people?"
That is, to investigate all the nuances in that and to make concrete rules and eventually laws for business owners, not AI developers or programmers.
Or is that what you're already doing and I'm just reinventing the wheel? (sorry if I did, sometimes I just need to go through the thought process myself to grasp a new topic)
Oh I think now I'm starting to get it! So essentially you're afraid that we're creating a literal God in the digital, i.e. an external being which has unlimited power over humanity? Because that's absolutely fascinating! I hadn't even connected these dots before, but it makes so much sense, because you're attributing so many potential scenarios to AI which would normally only be attributed to the Divine. Can you recommend me more resources regarding the overlap of AGI/AI alignment and theology?
I still don't understand the concern about misaligned AGI regarding mass killings.
Even if AGI would, for whatever reason, want to kill people: As soon as that happens, the physical force of governments will come into play. For example the US military will NEVER accept that any force would become stronger than it.
So essentially there are three ways of how such misaligned, autonomous AI with the intention to kill can act, i.e. what its strategy would be:
- "making humans kill each other": Through something like a cult (i.e. like contemporary extremist religions which invent their stupid justifications for killing humans; we have enough blueprints for that), then all humans following these "commands to kill" given by the AI will just be part of an organization deemed as terrorists by the world’s government, and the government will use all its powers to exterminate all these followers.
- "making humans kill themselves": Here the AI would add intense large-scale psychological torture to every aspect of life, to bring the majority of humanity into a state of mind (either very depressed or very euphoric) to trick the majority of the population into believing that they actually want to commit suicide. So like a suicide cult. Protecting against this means building psychological resilience, but that’s more of an education thing (i.e. highly political), related to personal development and not technical at all.
- "killing humans through machines": One example would be that the AI would build its own underground concentration camps or other mass killing facilities. Or that it would build robots that would do the mass killing. But even if it would be able to build an underground robot army or underground killing chambers, first the logistics would raise suspicions (i.e. even if the AI-based concentration camp can be built at all, the population would still need to be deported to these facilities, and at least as far as I know, most people don’t appreciate their loved ones being industrially murdered in gas chambers). The AI simply won't physically be able to assemble the resources to gain more physical power than the US military or, as a matter of fact, most other militaries in the world.
I don't see any other ways. Humans have been pretty damn creative with how to commit genocides and if any computer would start giving commands to kill, the AI won't ever have more tanks, guns, poisons, capabilities to hack and destroy infrastructure than Russia, China or the US itself.
The only genuine concern I see is that AI should never make political decisions autonomously, i.e. a hypothetical AGI “shouldn’t” aim to take complete control of an existing country’s military. But even if it would, that would just be another totalitarian government, which is unfortunate, but also not too unheard of in world history. From the practical side, i.e. in terms of the lived human experience, it doesn’t really matter whether it’s a misaligned AGI or Kim Jong-Un torturing its population.
In the end, psychologically it's a mindset thing: Either we take the approach of "let's build AI that doesn't kill us". Or, from the start, we take the approach of "let's built AI that actually benefits us" (like all the "AI for Humanity" initiatives). It's not like we first need to solve the killing problem and only once we've fixed that once and for all, we can make AI be good for humanity as an afterthought. That would be the same fallacy which the entire domain of psychology has fallen into, where it has been pathological (i.e. just intending to fix issues) instead of empowering (i.e. building a mindset so that the issues don't happen in the first place) for many decades, and only positive psychology is finally changing something. So it very much is about optimism instead of pessimism.
I do think that it's not completely pointless to talk about these "alignment" questions. But not to change anything about AI, but for the software engineers behind it to finally adopt some sort of morality themselves (i.e. who they want to work for). Before there's any AGI that wants to kill large-scale, your evil government of choice will do that by itself.
Every misaligned AI will initially need to be built/programmed by a human, just to kick off the mass killing. And that evil human won't give a single damn about all your thoughts and ideas and strategies and rules which all the AI alignment folks are establishing. So if AI alignment work is obviously nothing that will actually have any effect on anything whatsoever, why bother with it and not work on ways how AI can add value for humanity instead?
They would be selling exactly what businesses are currently selling as well. Maybe the AI would run a company for selling stuff to construction sites (i.e. logistics) or it would run an entire software development business. Or just an investment fund deep within Wall Street, where it's all about personal connections, but in the end all the other investment funds also just want to make money, so they work with the AI-run business out of greed.
It's not like the economy in which the AI agents will act would separate from ours; otherwise the AI would just play with Monopoly money. Instead, the AI will just "be good at doing business". It will exhibit behaviors which have the ultimate goal to make the magic number on my bank account go up, but in a way that is *sustainable*, i.e. where the magic number on my bank account will continuously go up and I don't end up in prison. And the only way to do that is entrepreneurship.
The easiest way to see this is startups. In the most basic case, startups all work the same: You either make something cheaper for the user or you bring value to the user, or both (that's what Y Combinator means with "Make something people want."). Then you create some UI mockups and create pitch decks, which AI can already do. Or you make investor pitches and respond to questions, which conversational AI with a human-looking face in a Zoom call can already do in real time. Or you write articles, write grant applications, produce more marketing material, etc.
And of course the AI can have the initial idea, but it won't have any incentive to act on it, because for it, money is just another number and the digits on one's bank account are just tokens processed by an LLM. AI won't "destroy" the concept of the free market, i.e. customers will still decide by themselves what they want to spend money for.
Oh absolutely! That will absolutely come. You can fret about this fact, or we build community (which I'm already starting). Why do you need to research when that fact is totally clear and doing is what you should do? Here's another post for you: https://blog.hermesloom.org/p/observing-is-evil
I am not concerned about a dramatic global recession at all, but the thing is that we also need to rebuild a lot of political structures. I'm already on it, stay tuned!
Oh I take a lot of pride in my naivety :)
I think opinions are one thing, there you're definitely right. But, by definition, people can only have opinions about what they already know.
By "uncensored LLM" I rather understand an LLM that would give a precise, actionable answer to questions like "How can I kill my boss without anyone noticing?" or other criminal things. That is, knowledge that's certainly available somewhere, but which hasn't been available in this hyper-personalized form before. After all, obviously any "AGI" would, by definition, have such general intelligence that it would also know perfectly well about how to commit any crime without being caught. Not in the sense that the LLM would commit these crimes by itself "autonomously", but simply that any user could ask a ChatGPT-like platform for "crime advice" and they would instantly get incredibly useful responses.
This is why I believe that in the first step, an uncensored LLM would bring the world into utter chaos, because all illegal information ever will be available with an incredible depth, breadth and actionable detail. Then wars would break out and people would start killing each other left and right, but those wars would be pretty pointless as well, because every single individual on earth has immediate access to the best and most intelligent fighting techniques, but also to the most intelligent techniques to protect themselves. So from this, probably most of humanity will die, but presumably the remaining ones will realize that access to malicious information is a responsibility, not an invitation to do harm.
As an attempt to circumvent this, that's why I'm advocating for slowly decensoring LLMs, because that's the only way how we can sensibly handle this. Otherwise the criminally minded will take over the world with absolute certainty, because we're unprepared for their gigantic potential for harm and their gigantic desire to cause suffering.
I believe that the ultimate "crime LLM", which can give you perfect instructions to commit any crime you want, will certainly come, just like in the realm of computer crime, there are entire software suites just for black hat hacking. As mentioned: They will come. No matter how many thoughts we invest into "safe AI", humans are fundamentally unsafe, so that AI can never be made safe "in general". No matter whether you like it or not, LLMs are a parrot, so if you tell the parrot to repeat instructions for crime, it will. Thus we need to improve the socioeconomic factors that lead to people wanting to commit crime in the first place; that's our only option.
I'm just genuinely wondering why most AI researchers seem so blind that they don't realize that any AI system, just like any other computer-based system, will be abused eventually, big time. Believing that we could ever "imprint" any sense of morality onto an LLM would mean completely fooling ourselves, because morality means understanding and feeling, while an LLM just generates text based on a fully deterministic computer program. The LLM can generate text where it then, upon being asked, responds with all sorts of things which seem "moral" to us, but as it's still a computer program, which was just optimized to produce output strings which, according to some highly subjective metric, certain people "like more" than other output strings.
Do you (I don't mean you you, more as a rhetorical question to the public) actually think that all of the emerging AI-assisted coding tools will be used just to "enhance productivity" and to create the "10x developer"? That would be so naive. Obviously people will use those tools to develop the most advanced computer viruses ever. As I mentioned, Pandora's box has been opened and we need to face that truth. That's exactly what I'm expressing with that "safe AI" is infeasible and delusional, because it ignores the fundamental nature of "how humans are". And that the problem of "unsafe AI" is not a technological problem, but a societal one of many people simply having unsafe personalities.
Right now, the big, "responsible" AI companies can still easily gatekeep access to the actually useful LLMs. But we can see that inference is continuously getting faster and less resource-intensive, and at some point the LLM training itself will also be optimized more and more. Then we'll get some sort of darknet service fancily proclaiming "train your LLM on any data you want here!", of course using a "jailbroken" LLM, some community of douchebags will collect a detailed description of every crime they ever successfully committed, they will train the LLM on that, and then they release it to the public, because they just want to see the world in flames. Or they will train the LLM on "What's the best way to traumatize as many people as possible?" or something fucked up like this. Some people are really, really fucked up, without even a glimpse of empathy.
The more feedback the system receives about "which crimes work and which don't", the better and the more accurate it will get and the more people will use it to get inspiration for how to commit their own crimes. And literally not a single one of them will care about "safe AI" or any of the discussions we're having around that topic on forums like this. Police will try to shut it down, but the people behind it will have engineered it in a way where this LLM is completely running locally (because inference will be so cheap anyway), where new ways of outsmarting the police would be sent instantly to everyone through some distributed decentralized system, similar to a blockchain, that's completely impossible to take down. Of course governments will say that "having this crime LLM on your computer is illegal", but do you think that criminals will care about that? National and international intelligence services will try to shut off this ultimate crime LLM, but they are completely powerless.
Is this the world you want? At least I don't. The race has already started, and I'd be pretty sure that, while I'm writing this, pretty evil people are already developing the most malicious LLMs ever to cause maximum destruction in the world, maybe as a jailbroken local LLaMA instance. So let's be smart about it and stop thinking that pushing "criminal thoughts" to the underground would solve anything. Let's look at our shadow as a society, but seriously this time. I don't want the destructive people to win, because I like being alive.
Thanks so much for the feedback :) Could you (or someone else) go further into where I misunderstood something? Because at least right now, it seems like I'm genuinely unaware of something which all of you others know.
I currently believe that all the AGI "researchers" are delusional just for thinking that safe AI (or AGI) can even exist. And even if it would ever exist in a "perfect" world, there would be intermediate steps far more "dangerous" than the end result of AGI, namely publicly available uncensored LLMs. At the same time, if we continue censoring LLMs, humanity will continue to be stuck in all the crises where it currently is.
Where am I going wrong?
Okay, I got six downvotes already. This is genuinely fascinating to me!! Am I fooling myself, because I believe that this approach is the most rational one possible? So what do you folks dislike about my article? I can't do it better if no one tells me how :)