Posts
Comments
+1, you convinced me.
I worry this will distract from risks like "making an AI that is smart enough to learn how to hack computers from scratch", but I don't buy the general "don't distract with true things" argument.
"I don't think that there is more that 1% that support direct violence against non-terrorists for its own sake": This seems definitely wrong to me, if you also count Israelies who consider everyone in Gaza as potential terrorists or something like that.
If you offer Israelies:
Button 1: Kill all of Hamas
Button 2: Kill all of Gaza
Then definitely more than 1% will choose Button 2
I haven't heard of anything like that (but not sure if I would).
Note there are also problems in trying to set up a government using force, in setting up a police force there if they're not interested in it, in building an education system (which is currently, afaik, very anti Israel and wouldn't accept Israel's opinions on changes, I think) ((not that I'm excited about Israel's internal education system either)).
I do think Israel provides water, electricity, internet, equipment, medical equipment (subsidized? free? i'm not sure of all this anyway) to Gaza. I don't know if you count that is something like "building a stockpile of equipment for providing clean drinking water to residents of occupied territory".
I don't claim the current solution is good, I'm just pointing out some problems with what I think you're suggesting (and I'm not judging whether those problems are bigger or smaller).
What do you mean by "building capacity" in this context? (maybe my English isn't good enough, I didn't understand your question)
I was a software developer in the Israeli military (not a data scientist), and I was part of a course constantly trains software developers for various units to use.
The big picture is that the military is a huge organization, and there is a ton of room for software to improve everything. I can't talk about specific uses (just like I can't describe our tanks or whatever, sorry if that's what you're asking, and sorry I'm not giving the full picture), but even things like logistics or servers or healthcare have big teams working on them.
Also remember the military started a long time ago, when there weren't good off-the-shelf solutions for everything, and imagine how big are the companies that make many of the products that you (or orgs) use.
- There are also many Israelies that don't consider Plaestinians to be humans worth protecting, but rather as evil beings / outgroup / whatever you'd call that.
- Also (with much less confidence), I do think many Palastinians want to kill Israelies because of things that I'd consider brainwashing.
- Hard question - what to do about a huge population that's been brainwashed like that (if my estimation here is correct), or how might a peaceful resolution look?
Not a question, but seems relevant for people who read this post:
Meni Rosenfeld, one of the early LessWrong Israel members, has enlisted:
Source: https://www.facebook.com/meni.rosenfeld/posts/pfbid0bkvfrb3qFTF7U82eMgkZzgMjMT4s3pbGUx7ahgKX1B8hr2n1viYqg9Msz6t3dBUPl (a public post by him)
Any ideas on how much to read this as "Sam's actual opinions" vs "Sam trying to say things that will satisfy the maximum amount of people"?
(do we have priors on his writings? do we have information about him absolutely not meaning one or more of the things here?)
Hey Kaj :)
The part-hiding-complexity here seems to me like "how exactly do you take a-simulation/prediction-of-a-person and get from it the-preferences-of-the-person".
For example, would you simulate a negotiation with the human and how the negotiation would result? Would you simulate asking the human and then do whatever the human answers? (there were a few suggestions in the post, I don't know if you endorse a specific one or if you even think this question is important)
Because (I assume) once OpenAI[1] say "trust our models", that's the point when it would be useful to publish our breaks.
Breaks that weren't published yet, so that OpenAI couldn't patch them yet.
[unconfident; I can see counterarguments too]
- ^
Or maybe when the regulators or experts or the public opinion say "this model is trustworthy, don't worry"
I'm confused: Wouldn't we prefer to keep such findings private? (at least, keep them until OpenAI will say something like "this model is reliable/safe"?)
My guess: You'd reply that finding good talent is worth it?
This seems like great advice, thanks!
I'd be interested in an example for what "a believable story in which this project reduces AI x-risk" looks like, if Dane (or someone else) would like to share.
A link directly to the corrigibility part (skipping unrelated things that are in the same page) :
This post got me to do something like exposure therapy to myself in 10+ situations, which felt like the "obvious" thing to do in those situations. This is a huge amount of life-change-per-post
My thoughts:
[Epistemic status + impostor syndrome: Just learning, posting my ideas to hear how they are wrong and in hope to interact with others in the community. Don't learn from my ideas]
A)
Victoria: “I don't think that the internet has a lot of particularly effective plans to disempower humanity.
I think:
- Having ready plans on the internet and using them is not part of the normal threat model from an AGI. If that was the problem, we could just filter out those plans from the training set.
- (The internet does have such ideas. I will briefly mention biosecurity, but I prefer not spreading ideas on how to disempower humanity)
B)
[Victoria:] I think coming up with a plan that gets past the defenses of human society requires thinking differently from humans.
TL;DR: I think some ways to disempower humanity don't require thinking differently than humans
I'll split up AI's attack vectors into 3 buckets:
- Attacks that humans didn't even think of (such as what we can do to apes)
- Attacks that humans did think of but are not defending against (for example, we thought about pandemic risks but we didn't defended against them so well). Note this does not require thinking about things that humans didn't think about.
- Attacks that humans are actively defending against, such as using robots with guns or trading in the stock market or playing go (go probably won't help taking over the world, but humans are actively working on winning go games, so I put the example here). Having an AI beat us in one of these does require it to be in some important (to me) sense smarter than us, but not all attacks are in this bucket.
C)
[...] requires thinking differently from humans
I think AIs already today think differently than humans in any reasonable way we could mean that. In fact, if we could make an them NOT think differently than humans, my [untrustworthy] opinion is that this would be non-negligible progress towards solving alignment. No?
D)
The intelligence threshold for planning to take over the world isn't low
First, disclaimers:
(1) I'm not an expert and this isn't widely reviewed, (2) I'm intentionally being not detailed in order to not spread ideas on how to take over the world, I'm aware this is bad epistemic and I'm sorry for it, it's the tradeoff I'm picking
So, mainly based on A, I think a person who is 90% as intelligent as Elon Musk in all dimensions would probably be able to destroy humanity, and so (if I'm right), the intelligence threshold is lower than "the world's smartest human". Again sorry for the lack of detail. [mods, if this was already too much, feel free to edit/delete my comment]
"Doing a Turing test" is a solution to something. What's the problem you're trying to solve?
As a judge, I'd ask the test subject to write me a rap song about turing tests. If it succeeds, I guess it's a ChatGPT ;P
More seriously - it would be nice to find a judge that doesn't know the capabilities and limitations of GPT models. Knowing those is very very useful
[I also just got funded (FTX) to work on this for realsies 😸🙀 ]
I'm still in "learn the field" mode, I didn't pick any direction to dive into, but I am asking myself questions like "how would someone armed with a pretty strong AI take over the world?".
Regarding commitment from the mentor: My current format is "live blogging" in a Slack channel. A mentor could look whenever they want, and comment only on whatever they want to. wdyt?
(But I don't know who to add to such a channel which would also contain the potentially harmful ideas)
This is a problem for me, a few days after starting to (try) doing this kind of research. Any directions?
The main reason for me is that I want feedback on my ideas, to push me away from directions that are totally useless, which I'm afraid to fall into since I'm not an experienced researcher.
I recommend discussing in the original comment as opposed to splitting up the comments between places, if you have something to ask/say
Poll: Agree/Disagree:
Working for a company that advances AI capabilities is a good idea for advancing-safety because you can speak up if you disagree with something, and this outweighs the downside of how you'd help them advance capabilities
Poll: Agree/disagree:
Working for companies that advance AI capabilities is generally a good idea for people worried about AI risk
Could you help me imagine an AGI that "took over" well enough to modify it's own code or variables - but chooses not to "wire head" it's utility variable but rather prefers to do something in the outside world?
This looks like a guide for [working in a company that already has a research agenda, and doing engineering work for them based on what they ask for] and not for [trying to come up with a new research direction that is better than what everyone else is doing], right?
Ah,
I thought it was "I'm going to sacrifice sleep time to get a few extra hours of work"
My bad
- I try to both [be useful] and [have a good life / don't burn out]
- I started thinking a few days ago about investments. Initial thoughts:
- Given we're not all dead, what happens and how to get rich?
- Guess 1: There was a world war or something similar that got to all AI labs worldwide
- My brain: OMG that sounds really bad. Can I somehow avoid the cross fire?
- Guess 2: One specific org can generate 100x more tech and science progress than the entire rest of the world combined
- My brain: I hope they will be publicly tradable, still respect the stock market, and I can buy their stock in advance?
- Problem: Everyone wants to invest in AI companies already. Do I have an advantage?
- Guess 1: There was a world war or something similar that got to all AI labs worldwide
- If there will be a few years of vast-strangeness before we'll probably all die, can I get very rich beforehand and maybe use that for something?
- (Similar to Guess 2 above, and also doesn't seem promising)
- Given we're not all dead, what happens and how to get rich?
This is just initial, I'm happy in anyone joining the brainstorm, it's easier together
I disagree with "sleeping less well at night".
I think if you're able to sleep well (if you can handle the logistics/motivation around it, or perhaps if sleeping well is a null action with no cost), it will be a win after a few days (or at most, weeks)
When I ask this question, my formulation is "50% of the AI capabilities researchers [however you define those] stop [however you define that] for 6 months or more".
I think that your definition, of "making people change their mind" misses the point that they might, for example, "change their mind and work full time on making the AGI first since they know how to solve that very specific failure mode" or whatever
Epistemic Status: Trying to form my own views, excuse me if I'm asking a silly question.
TL;DR: Maybe this is over fitted to 2020 information?
My Data Science friends tell me that to train a model, we take ~80% of the data, and then we test our model on the last 20%.
Regarding your post: I wonder how you'd form your model based on only 2018 information. Would your model nicely predict the 2020 information or would it need an update (hinting that it is over fitted)? I'm asking this because it seems like the model here depends very much on cutting edge results, which I would guess makes it very sensitive to new information.
May I ask what you are calling "general alignment sympathy"? Could you say it in other words or give some examples?
I don't think "infinite space" is enough to have infinite copies of me. You'd also need infinite matter, no?
[putting aside "many worlds" for a moment]
Anonymous question (ask here) :
Why do so many Rationalists assign a negligible probability to unaligned AI wiping itself out before it wipes humanity out?
What if it becomes incredibly powerful before it becomes intelligent enough to not make existential mistakes? (The obvious analogy being: If we're so certain that human wisdom can't keep up with human power, why is AI any different? Or even: If we're so certain that humans will wipe themselves out before they wipe out monkeys, why is AI any different?)
I'm imagining something like: In a bid to gain a decisive strategic advantage over humans and aligned AIs, an unaligned AI amasses an astonishing amount of power, then messes up somewhere (like AlphaGo making a weird, self-destructive move, or humans failing at coordination and nearly nuking each other), and ends up permanently destroying its code and backups and maybe even melting all GPUs and probably taking half the planet with it, but enough humans survive to continue/rebuild civilisation. And maybe it's even the case that hundreds of years later, we've made AI again, and an unaligned AI messes up again, and the cycle repeats itself potentially many, many times because in practice it turns out humans always put up a good fight and it's really hard to kill them all off without AI killing itself first.
Or this scenario considered doom? (Because we need superintelligent AI in order to spread to the stars?)
(Inspired by Paul's reasoning here: "Most importantly, it seems like AI systems have huge structural advantages (like their high speed and low cost) that suggest they will have a transformative impact on the world (and obsolete human contributions to alignment retracted) well before they need to develop superhuman understanding of much of the world or tricks about how to think, and so even if they have a very different profile of abilities to humans they may still be subhuman in many important ways." and similar to his thoughts here: "One way of looking at this is that Eliezer is appropriately open-minded about existential quantifiers applied to future AI systems thinking about how to cause trouble, but seems to treat existential quantifiers applied to future humans in a qualitatively rather than quantitatively different way (and as described throughout this list I think he overestimates the quantitative difference).")
If this question becomes important, there are people in our community who are.. domain experts. We can ask
Hey,
TL;DR I know a researcher who's going to start studying C. elegans worms in a way that seems interesting as far as I can tell. Should I do something about that?
I'm trying to understand if this is interesting for our community, specifically as a path to brain emulation, which I wonder if could be used to (A) prevent people from dying, and/or (B) creating a relatively-aligned AGI.
This is the most relevant post I found on LW/EA (so far).
I'm hoping someone with more domain expertise can say something like:
- "OMG we should totally extra fund this researcher and send developers to help with the software and data science and everything!"
- "This sounds pretty close to something useful but there are changes I'd really like to see in that research"
- "Whole brain emulation is science fiction, we'll obviously destroy the world or something before we can implement it"
- "There is a debate on whether this is useful, the main positions are [link] and [link], also totally talk to [person]"
Any chance someone can give me direction?
Thx!
(My background is in software, not biology or neurology)
I heard "kiwi" is a company with a good reputation, but I didn't try their head strap myself. I have their controller-straps which I really like
- (I'm not sure but why would this be important? Sorry for the silly answer, feel free to reply in the anonymous form again)
- I think a good baseline for comparison would be
- Training large ML models (expensive)
- Running trained ML models (much cheaper)
- I think comparing to blockchain is wrong, because
- it was explicitly designed to be resource intensive on purpose (this adds to the security of proof-of-work blockchains)
- there is a financial incentive to use a specific (very high) amount of resources on blockchain mining (because what you get is literally a currency, and this currency has a certain value, so it's worthwhile to spend any money lower than that value on the mining process)
- None of these are true for ML/AI, where your incentive is more something like "do useful things"
+1 for the Abusive Relationships section.
I think there's a lot of expected value in a project that raises awareness to "these are good reason to break up" and/or "here are common-but-very-bad reasons to stay in an abusive relationship", perhaps with support for people who choose to break up. It's a project I sometimes think of opening but I'm not sure where I'd start
Anonymous question (ask here) :
Given all the computation it would be carrying out, wouldn't an AGI be extremely resource-intensive? Something relatively simple like bitcoin mining (simple when compared to the sort of intellectual/engineering feats that AGIs are supposed to be capable of) famously uses up more energy than some industrialized nations.
If you buy a VR (especially if it's an Oculus Quest 2), here's my getting started guide
Just saying I appreciate this post being so short <3
(and still informative)
Ok,
I'm willing to assume for sake of the conversation that the AGI can't get internet-disconnected weapons.
Do you think that would be enough to stop it?
("verified programmatically": I'm not sure what you mean. That new software needs to be digitally signed with a key that is not connected to the internet?)
But don't you think "reverse engineering human instincts" is a necessary part of the solution?
I don't know, I don't have a coherent idea for a solution. Here's one of my best ideas (not so good).
Yudkowsky split up the solutions in his post, see point 24. The first sub-bullet there is about inferring human values.
Maybe someone else will have different opinions
TL;DR: Hacking
Doesn't require trial and error in the sense you're talking about. Totally doable. We're good at it. Just takes time.
What good are humans without their (internet connected) electronics?
How harmless would an AGI be if it had access merely to our (internet connected) existing weapons systems, to send orders to troops, and to disrupt any supplies that rely on the internet?
What do you think?
Update: Anthropic's own computers are connected to the internet. link. This was said publicly by the person in charge of Anthropic's information security.
[extra dumb question warning!]
Why are all the AGI doom predictions around 10%-30% instead of ~99%?
Is it just the "most doom predictions so far were wrong" prior?
I'd be pretty happy to bet on this and then keep discussing it, wdyt? :)
Here are my suggested terms:
- All major AI research labs that we know about (deep mind, openai, facebook research, china, perhaps a few more*)
- Stop "research that would advance AGI" for 1 month, defined not as "practical research" but as "research that will be useful for AGI coming sooner". So for example if they stopped only half of their "useful to AGI" research, but they did it for 3 months, you win. If they stopped training models but keep doing the stuff that is the 90% bottleneck (which some might call "theoretical"), I win
- *You judge all these parameters yourself however you feel like
- I'm just assuming you agree that the labs mentioned above are currently going towards AGI, at least for the purposes of this bet. If you believe something like "openai (and the other labs) didn't change anything about their research but hey, they weren't doing any relevant research in the first place", then say so now
- I might try to convince you to change your mind, or ask others to comment here, but you have the final say
- Regarding "the catastrophe was unambiguously attributed to the AI" - I ask that you judge if it was unambiguously because AI, and that you don't rely on public discourse, since the public can't seem to unambiguously agree on anything (like even vaccines being useful).
I suggest we bet $20 or so mainly "for fun"
What do you think?
My answer for myself is that I started practicing: I started talking to some friends about this, hoping to get better at presenting the topic (which is currently something I'm kind of afraid to do) (I also have other important goals like getting an actual inside view model of what's going on)
If you want something more generic, here's one idea:
Do you mean something like "only get 100 paperclips, not more?"
If so - the AGI will never be sure it has 100 paperclips, so it can take lots of precautions to be very, very sure. Like turning all the world into paperclip counters or so
I don't know, I'm replying here with my priors from software development.
TL;DR:
Do something that is
- Mostly useful (software/ML/math/whatever are all great and there are others too, feel free to ask)
- Where you have a good fit, so you'll enjoy and be curious about your work, and not burn out from frustration or because someone told you "you must take this specific job"
- Get mentorship so that you'll learn quickly
And this will almost certainly be useful somehow.
Main things my prior is based on:
EA in general and AI Alignment specifically need lots of different "professions". We probably don't want everyone picking the number one profession and nobody doing anything else. We probably want each person doing whatever they're a good fit for.
The amount we "need" is going up over time, not down, and I can imagine it going up much more, but can't really imagine it going down (so in other words, I mostly assume whatever we need today, which is quite a lot, will also be needed in a few years. So there will be lots of good options to pick)