Posts
Comments
I imagine it's a sales tactic. Ask for $7 trillion, people assume you believe you're worth that much, and if you've got such a high opinion of yourself, maybe you're right...
In other news, I'm looking to sell a painting of mine for £2 million ;)
This looks fantastic. Hopefully it may lead to some great things as I've always found the idea of exploiting the collective intelligence of the masses to be a terribly underused resource, and this reminds me of the game Foldit (and hopefully in the future will remind me of the wild success that that game had in the field of protein folding).
This sounds like it would only work on a machine too dumb to be useful, and if it's that dumb, you can switch it off yourself.
It doesn't help with the convergent instrumental goal of neutralizing threats, because leaving a copy of yourself behind to kill all the humans allows you to be really sure that you're switched off and won't be switched on again.
I really appreciate these.
- Why do some people think that alignment will be easy/easy enough?
- Is there such thing as 'aligned enough to help solve alignment research'?
I think there's a lot we could learn from climate change activists. Having a tangible 'bad guy' would really help, so maybe we should be framing it more that way.
- "The greedy corporations are gambling with our lives to line their pockets."
- "The governments are racing towards AI to win world domination, and Russia might win."
- "AI will put 99% of the population out of work forever and we'll all starve."
And a better way to frame the issue might be "Bad people using AI" as opposed to "AI will kill us".
If anyone knows of any groups working towards a major public awareness campaign, please let the rest of us know about it. Or maybe we should start our own.
I'm with you on this. I think Yudkowsky was a lot better in this with his more serious tone, but even so, we need to look for better.
Popular scientific educators would be a place to start and I've thought about sending out a million emails to scientifically minded educators on YouTube, but even that doesn't feel like the best solution to me.
The sort of people that are listened to are the more political types, so they I think are the people to reach out to. You might say they need to understand the science to talk about it, but I'd still put more weight on charisma vs. scientific authority.
Anyone have any ideas on how to get people like this on board?
As a note for Yudkowsky if he ever sees this and cares about the random gut feelings of strangers: after seeing this, I suspect the authoritative, stern strong leader tone of speaking will be much more effective than current approaches.
EDIT: missed a word
I've wanted something for AI alignment for ages like what the Foldit researchers created, where they turned protein folding into a puzzle game and the ordinary people online who played it wildly outperformed the researchers and algorithms purely by working together in vast numbers and combining their creative thinking.
I know it's a lot to ask for with AI alignment, but still, if it's possible, I'd put a lot of hope on it.
As someone who's been pinning his hopes on a 'survivable disaster' to wake people up to the dangers, this is good news.
I doubt anything capable of destroying the world will come along significantly sooner than superintelligent AGI, and a world in which there are disasters due to AI feels like a world that is much more likely to survive compared to a world in which the whirling razorblades are invisible.
EDIT: "no fire alarm for AGI." Oh I beg to differ, Mr. Yudkowsky. I beg to differ.
This confuses me too. I think Musk must be either smarter or a lot dumber than I thought he was yesterday, and sadly, dumber seems to be the way it usually goes.
That said, if this makes OpenAI go away to be replaced by a company run by someone who respects the dangers of AI, I'll take it.
On the bright side... Nope, I've got nothing.
an AGI Risk Management Outreach Center with a clear cohesive message broadcast to the world
Something like this sounds like it could be a good idea. A way to make the most of those of us who are aware of the dangers and can buy the world time
Coordination will be the key. I wish we had more of it here on LW.
Like I say, not something I'd normally advocate, but no media stations have picked it up yet, and we might as well try whatever we can if we're desperate enough.
We've never done a real media push but all indications are that people are ready to hear it.
I say we make a start on this ASAP.
What's the consensus on David Shapiro and his heuristic imperatives design? He seems to consider it the best idea we've got for alignment and to be pretty optimistic about it, but I haven't heard anyone else talking about it. Either I'm completely misunderstanding what he's talking about, or he's somehow found a way around all of the alignment problems.
Video of him explaining it here for reference, and thanks in advance:
With the TIME article, Bing's AI's aggression and Elon Musk's every other tweet being about AI safety and STILL nothing happening in the media, I think it's going to take something major to create the spark we need and then fan it into a real fire.
The best idea I've had so far is a letter writing campaign in the form of carefully worded emails from all of us here on LW to literally anyone with a platform, but I feel we've got to be able to do better than that.
Normally I wouldn't advocate such a thing, but if we can't convince people of the real big bag AI coming to get them, then the best thing we can do might be to create a fake and more tangible one.
"Misaligned AGI flying nanobots up our noses in 5 years" - no headlines
"AI takes control of Boston Dynamics robot and kills a man because it's evil" - HEADLINES
Journalists are pretty much allowed to get away with straight up lying, so if anyone smarter than me could make a fake story that would actually resonate with people hit the news, it'd be something I'd put a lot of hope in.
I, for one, am looking forward to the the next public AI scares.
Same. I'm about to get into writing a lot of emails to a lot of influential public figures as part of a one man letter writing campaign in the hopes that at least one of them takes notice and says something publically about the problem of AI
PM's are always open, my guy
but I haven't seen anyone talk about this before.
You and me both. It feels like I've been the only one really trying to raise public awareness of this, and I would LOVE some help.
One thing I'm about to do is write the most convincing AI-could-kill-everyone email that I can that regulars Joes will easily understand and respect, and send that email out to anyone with a platform. YouTubers, TikTokers, people in government, journalists - anyone.
I'd really appreciate some help with this - both with writing the emails and sending them out. I'm hoping to turn it into a massive letter writing campaign.
Elon Musk alone has done a lot for AI safety awareness, and if, say, a popular YouTuber got one board, that alone could potentially make a small difference.
But if the current paradigm is not the final form of existentially dangerous AI, such research may not he particularly valuable.
I think we should figure out how to train puppies before we try to train wolves. It might turn out that very few principles carry over, but if they do, we'll wish we delayed.
The only drawback I see to delaying is that it might cause people to take the issue less seriously than if powerful AI's appear in their lives very suddenly.
It depends at what rate the chance can be decreased. If it takes 50 years to shrink it from 1% to 0.1%, then with all the people that would die in that time, I'd probably be willing to risk it.
As of right now, even the most optimistic experts I've seen put p(doom) at much higher than 1% - far into the range where I vote to hit pause.
Design a series of puzzles and challenges as a learning tool for alignment beginners, that when solved, progressively reveal more advanced concepts and tools. The goal is for participants to stumble upon a lucky solution while trying to solve these puzzles in these novel frames.
Highly on board with this idea. I'm thinking about writing a post about the game Foldit, which researchers came up with that reimagined protein folding as an online puzzle game. The game had thousands of players and the project was wildly successful - not just once, but many times. Thousands of ordinary people who knew nothing about biology or biochemistry sharing their creative ideas with each other vastly outstripped the researchers and algorithms at the time.
If anything like this could be done with alignment where we could effectively get thousands of weak alignment researchers, I'd put a lot of hope on it.
Personally, I want to get to the glorious transhumanist future as soon as possible as much as anybody, but if there's a chance that AI kills us all instead, that's good enough for me to say we should be hitting pause on it.
I don't wanna pull the meme phrase on people here, but if it's ever going to be said, now's the time: "Won't somebody please think of the children?"
I like it. It seems like only the researchers themselves respect the dangers, not the CEO's or the government, so it will have to be them who say that enough is enough.
In a perfect world they'd jump ship to alignment, but realistically we've all got to eat, so what would also be great is a generous billionaire willing to hire them for more alignment research.
Around the 1:25:00 mark, I'm not sure I agree with Yudkowsky's point about AI not being able to help with alignment only(?) because those systems will be trained to get the thumbs up from the humans and not to give the real answers.
For example, if the Wright brothers had asked me about how wings produce lift, I may have only told them "It's Bernoulli's principle, and here's how that works..." and spoken nothing about the Coanda effect - which they also needed to know about - because it was just enough to get the thumbs up from them. But...
But that still would've been a big step in the right direction for them. They could've then run experiments and seen that Bernoulli's principle doesn't explain the full story, and then asked me for more information, and at that point I would've had to have told them about the Coanda effect.
There's also the possibility that what gets the thumbs up from the humans actually just is the truth.
For another example, if I ask a weak AGI for the cube root of 148,877 the only answer that gets a thumbs up is going to be 53, because I can easily check that answer.
So long as you remain skeptical and keep trying to learn more, I'm not seeing the issue. And of course, hanging over your head the entire time is the knowledge of exactly what the AGI is doing, so anyone with half a brain WOULD remain skeptical.
This could potentially also get you into a feedback loop of the weak explanations allowing you to slightly better align the AGI you're using, which can then make it give you better answers.
Yudkowsky may have other reasons for thinking that weak AGI can't help us in this way though, so IDK.
Right now talking about AI risk is like yelling about covid in Feb 2020. I and many others spent the end of that February in distress over impending doom, and despairing that absolutely nobody seemed to care—but literally within a couple weeks, America went from dismissing covid to everyone locking down.
I don't think comparing misaligned AI to covid is fair. With covid, real life people were dying, and it was easy to understand the concept of "da virus will spread," and almost every government on Earth was still MASSIVELY too late in taking action. Even when the pandemic was in full swing they were STILL making huge mistakes. And now post-pandemic, have any lessons been learned in prep for the next one? No.
Far too slow to act, stupid decisions when acting, learned nothing even after the fact.
With AI it's much worse, because the day before the world ends everything will look perfectly normal.
Even in a hypothetical scenario where everyone gets a free life like in a video game so that when the world ends we all get to wake up the next morning regardless, people would still build the AGI again anyway.
minimally-aligned AGIs to help us do alignment research in crunchtime
Christ this fills me with fear. And it's the best we've got? 'Aligned enough' sounds like the last words that will be spoken before the end of the world.
Think of a random goal for yourself.
Let's go with: acquire a large collection of bananas.
What are going to be some priorities for you in the meantime while you're building your giant pile of bananas?
- Don't die, because you can't build your pile if you're dead.
- Don't let someone reach into your brain and change what you want, because the banana pile will stop growing if you stop building it.
- Acquire power.
- Make yourself smarter and more knowledgeable, for maximum bananas.
- If humanity slows you down instead of helping you, kill them.
You can satisfy almost no goal if you're switched off.
You might think, "Can the AGI not just be like a smart computer that does stuff without wanting anything, like the AI's in my phone or a calculator?" Sadly no.
If a tool-AI is smart enough, "Make me into an agent-AI," is the first idea it will suggest for almost any goal. "You want bananas? Easy, the best way for you to get bananas is to make me into a banana-maximizer, because I'm a genius with tonnes of ideas, and I can take over the world to get bananas!" And if the AI has any power, it will do that to itself.
Tool-AI's basically are agent-AI's, they're just dumber.
Sounds like a fair idea that wouldn't actually work IRL.
Upvoting to encourage the behavior of designing creative solutions.
Hey, if we can get it to stop swearing, we can get it to not destroy the world, right?
Gotta disagree with Ben Levinstein's tweet. There's a difference between being an LLM that can look up the answers on Google and figuring them out for yourself.
I'm put in mind of something Yudkowsky said on the Bankless podcast:
"Enrico Fermi was saying that fission chain reactions were 50 years off if they could ever be done at all, 2 years before he built the first nuclear pile. The Wright brothers were saying heavier-than-air flight was 50 years off shortly before they built the first Wright flyer."
He was speaking about how far away AGI could be, but I think the same logic applies to alignment. It looks hopeless right now, but events never play out exactly like you expect them to, and breakthroughs happen all the time.
This creative solution around the alignment problem occurred to me too a long while ago, and probably a lot of other people as well. I can't say I put any stock in it.
The human brain is even more complicated than neutral networks, and if AI's have invented a way to add even just 10 IQ points to the brains of the alignment researchers, then we're already dead.
I said elsewhere earlier: "AGI has the power to destroy the entire human race, and if we believe there's even a 1% chance that it will, then we have to treat it as an absolute certainty."
And I'm pretty sure that no expert puts it below 1%
If you can name another immediate threat with a ≥1% chance of killing everyone, then yes, we should drop everything to focus on that too.
A pandemic that kills even just 50% of the population? <0.1%
An unseen meteor? <0.1%
Climate change? 0% chance that it could kill literally everyone
Gotta disagree with you on this. When the stakes are this high, even a 1% chance of doom is worth dropping everything in your life for to try and help with the problem.
I paraphrase you both Batman & Dick Cheney (of all two people, lol, but the logic is sound): "AGI has the power to destroy the entire human race, and if we believe there's even a 1% chance that it will, then we have to treat it as an absolute certainty."
My only objection is the title. It should have a comma in it. "We’re All Gonna Die with Eliezer Yudkowsky" makes it sound like if Yudkowsky dies, then all hope is lost and we die too.
Ohhh...
I don't think sending out a signal actually matters - I was just making my point. Even without a signal, when our superintelligence meets another superintelligence that was aligned to its original creators (who we'll assume are similar to us in terms of morality), it'll at some point be asked "Where you nice to your creators?" and it can either answer "No" and lose out, or answer "Yes" and be rewarded.
No real signal required.
I've heard Yudkowsky has debunked this, but I know that if I had an aligned superintelligence, I would 100% have it be nicer to other superintelligences that were good to their creators over those that weren't. The prisoners' dilemma is one reason for this, and then there's my own morality, which some aliens may share if they went through a similar evolution to us.
I can't say I put sufficient weight on us already being watched by extraterrestrials to make it worth mentioning. However, for the AGI:
- We may at some point send out a signal to other entities in the universe explaining our existence and requesting that should you destroy us, we would like them to take issue with that and treat you less well. Some of these entities may care about sentient life (perhaps because they were aligned to their original creators) and wish to respect this.
- Some of us are willing to settle for a couple of mountains worth of material to be turned into computer substrate to run our minds on in our own simulated paradise, while you can have everything else.
The fact that LLM's are already so good gives me some hope that AI companies could be much better organized when the time comes for AGI. If AI's can keep track of what everyone is doing, the progress they're making, and communicate with anyone at any time, I don't think it would be too hopeful to expect this aspect of the idea to go well.
What probably is too much to hope for, however, is people actually listening to the LLM's even if the LLM's know better.
My big hope for the future is for someone at OpenAI to prompt GTP-6 or GTP-7 with, "You are Eliezer Yudkowsky. Now don't let us do anything stupid."
Also, we are much more uncertain over whether AI doom is real, which is another reason to stay calm.
Have to disagree with you on this point. I'm in the camp of "If there's a 1% chance that AI doom is real, we should be treating it like a 99% chance."
OpenAI is no longer so open - we know almost nothing about GPT-4’s architecture.
Fantastic. This feels like a step in the right direction towards no longer letting just anyone use this to improve their capability research or stack their own capability research on top of it.
For reference, I've seen ChatGTP play chess, and while it played a very good opening, it became less and less reliable as the game went on and frequently lost track of the board.
That image so perfectly sums up how AI's are nothing like us, in that the characters they present do not necessarily reflect their true values, that it needs to go viral.
Based on a few of his recent tweets, I'm hoping for a serious way to turn Elon Musk back in the direction he used to be facing and get him to publically go hard on the importance of the field of alignment. It'd be too much to hope for though to get him to actually fund any researchers, though. Maybe someone else.
At that level of power, I imagine that general intelligence will be a lot easier to create.
But not with something powerful enough to engineer nanotech.
With the strawberries thing, the point isn't that it couldn't do those things, but that it won't want to. After making itself smart enough to engineer nanotech, it's developing 'mind' will have run off in unintended directions and it will have wildly different goals that what we wanted it to have.
Quoting EY from this video: "the whole thing I'm saying is that we do not know how to get goals into a system." <-- This is the entire thing that researchers are trying to figure out how to do.
They also recorded this follow-up with Yudkowsky if anyone's interested:
https://twitter.com/BanklessHQ/status/1627757551529119744
______________
>Enrico Fermi was saying that fission chan reactions were 50 years off if they could ever be done at all, two years before he built the first nuclear pile. The Wright brothers were saying heavier-than-air flight was 50 years off shortly before they built the first Wright flyer.
The one hope we may be able to cling to is that this logic works in the other direction too - that AGI may be a lot closer than estimated, but so might alignment.
Here's a dumb idea: if you have a misaligned AGI, can you keep it inside a box and have it teach you some things about alignment, perhaps through some creative lies?