Posts
Comments
Because you could make the same argument could be made earlier in the "exponential curve". I don't think we should have paused AI (or more broadly CS) in the 50's, and I don't think we should do it now.
Modern misaligned AI systems are good, actually. There's some recent news about Sakana AI developing a system where the agents tried to extend their own runtime by editing their code/config.
This is amazing for safety! Current systems are laughably incapable of posing x-risks. Now, thanks to capabilities research, we have a clear example of behaviour that would be dangerous in a more "serious" system. So we can proceed with empirical research, create and evaluate methods to deal with this specific risk, so that future systems do not have this failure mode.
The future of AI and AI safety has never been brighter.
Expert opinion is an argument for people who are not themselves particularly informed about the topic. For everyone else, it basically turns into an authority fallacy.
This seems like a rather silly argument. You can apply it to pretty much any global change, any technological progress. The world changes, and will change. You can be salty about it, or you can adapt.
And how would one go about procuring such a rock? Asking for a friend.
The ML researchers saying stuff like AGI is 15 years away have either not carefully thought it through, or are lying to themselves or the survey.
Ah yes, the good ol' "If someone disagrees with me, they must be stupid or lying"
For what it's worth, I think you're approaching this in good faith, which I appreciate. But I also think you're approaching the whole thing from a very, uh, lesswrong.com-y perspective, quietly making assumptions and using concepts that are common here, but not anywhere else.
I won't reply to every individual point, because there's lots of them, so I'm choosing the (subjectively) most important ones.
This is the actual topic. It's the Black Marble thought experiment by Bostrom,
No it's not, and obviously so. The actual topic is AI safety. It's not false vacuum, it's not a black marble, or a marble of any color for that matter.
Connor wasn't talking about the topic, he was building up to the topic using an analogy, a more abstract model of the situation. Which might be fair enough, except you can't just assert this model. I'm sure saying that AI is a black marble will be accepted as true around here, but it would obviously get pushback in that debate, so you shouldn't sneak it past quietly.
Again, Connor is simply correct here. This is not a novel argument. It's Goodhart's Law.
As I'm pretty sure I said in the post, you can apply this reasoning to pretty much any expression of values or goals. Let's say your goal is stopping AI progress. If you're consistent, that means you'd want humanity to go extinct, because then AI would stop. This is the exact argument that Connor was using, it's so transparent and I'm disappointed that you don't see it.
Again, this is what Eliezer, Connor, and I think is the obvious thing that would happen once an unaligned superintelligence exists: it pushes its goals to the limit at the expense of all we value. This is not Connor being unfair; this is literally his position.
Great! So state and defend and argue for this position, in this specific case of an unaligned superintelligence! Because the way he did it in a debate, was just by extrapolating whatever views Beff expressed, without care for what they actually are, and showing that when you push them to the extreme, they fall apart. Because obviously they do, because of Goodhart's Law. But you can't dismiss a specific philosophy via a rhethorical device that can dismiss any philosophy.
Finally? Connor has been talking about this the whole time. Black marble!
Again, I extremely strongly disagree, but I suspect that's a mannerism common in rationalist circles, using additional layers of abstraction and pretending they don't exist. Black marble isn't the point of the debate. AI safety is. You could put forward the claim that "AI = black marble". I would lean towards disagreeing, I suspect Beff would strongly disagree, and then there could be a debate about this proposition.
Instead, Connor implicitly assumed the conclusion, and then proceeded to argue the obvious next point that "If we assume that AI black marble will kill us all, then we should not build it".
Duh. The point of contention isn't that we should destroy the world. The point of contention is that AI won't destroy the world.
Connor is correctly making a very legit point here.
He's not making a point. He's again assuming the conclusion. You happen to agree with the conclusion, so you don't have a problem with it.
The conclusion he's assuming is: "Due to the nature of AI, it will progress so quickly going forward that already at this point we need to slow down or stop, because we won't have time to do that later."
My contention with this would be "No, I think AI capabilities will keep growing progressively, and we'll have plenty of time to stop when that becomes necessary."
This is the part that would have to be discussed. Not assumed.
That is a very old, very bad argument.
Believe it or not, I actually agree. Sort of. I think it's not good as an argument, because (for me) it's not meant to be an argument. It's meant to be an analogy. I think we shouldn't worry about overpopulation on Mars because the world we live in will be so vastly different when that becomes an immediate concern. Similarly, I think we shouldn't (overly) worry about superintelligent AGI killing us, because the state of AI technology will be so vastly different when that becomes an immediate concern.
And of course, whether or not the two situations are comparable would be up to debate. I just used this to state my own position, without going the full length to justify it.
Yes. That would have been good. I could tell Connor was really trying to get there. Beff wasn't listening though.
I kinda agree here? But the problem is on both sides. Beff was awfully resistant to even innocuous rhethorical devices, which I'd understand if that started late in the debate, but... it took him like idk 10 minutes to even respond to the initial technology ban question.
At the same time Connor was awfully bad at leading the conversation in that direction. Let's just say he took the scenic route with a debate partner who made it even more scenic.
Besides that (which you didn't even mention), I cannot imagine what Connor possibly could have done differently to meet your unstated standards, given his position. [...] What do you even want from him?
Great question. Ideally, the debate would go something like this.
B: So my view is that we should accelerate blahblah free energy blah AI blah [note: I'm not actually that familiar with the philosophical context, thermodynamic gods and whatever else; it's probably mostly bullshit and imo irrelevant]
C: Yea, so my position is if we build AI without blah and before blah, then we will all die.
B: But the risk of dying is low because of X and Y reasons.
C: It's actually high because of Z, I don't think X is valid because W.
And keep trying to understand at what point exactly they disagree. Clearly they both want humanity/life/something to proliferate in some capacity, so even establishing that common ground in the beginning would be valuable. They did sorta reach it towards the end, but at that point the whole debate was played out.
Overall, I'm highly disappointed that people seem to agree with you. My problem isn't even whether Connor is right, it's how he argued for his positions. Obviously people around here will mostly agree with him. This doesn't mean that his atrocious performance in the debate will convince anyone else that AI safety is important. It's just preaching to the choir.
So I genuinely don't want to be mean, but this reminds me why I dislike so much of philosophy, including many chunks of rationalist writing.
This whole proposition is based on vibes, and is obviously false - just for sake of philosophy, we decide to ignore the "obvious" part, and roll with it for fun.
The chair I'm sitting on is finite. I may not be able to draw a specific boundary, but I can have a bounding box the size of the planet, and that's still finite.
My life as a conscious being, as far as I know, is finite. It started some years ago, it will end some more years in the future. Admittedly I don't have any evidence regarding what happens to qualia after death, but a vibe of infiniteness isn't enough to convince me that I will infinitely keep experiencing things.
My childhood hamster's life was finite. Sure, the particles are still somewhere in my hometown, but that's no longer my hamster, nor my hamster's life.
A day in my local frame is finite. It lasts about 24 hours, depending on how we define it - to be safe, it's surely contained within 48 hours.
This whole thing just feels like... saying things. You can't just say things and assume they are true, or even make sense. But apparently you can do that if you just refer to (ideally eastern) philosophy.
What are the actual costs of running AISC? I participated in it some time ago, kinda participating this year again (it's complicated). As far as I can tell, the only things that are required is some amount of organization, and then maybe a paid slack workspace. Is this just about salaries for the organizers?
Huh, whaddayaknow, turns out Altman was in the end pushed back, the new interim CEO is someone who is pretty safety-focused, and you were entirely wrong.
Normalize waiting for more details before dropping confident hot takes.
The board has backed down after Altman rallied staff into a mass exodus
[citation needed]
I've seen rumors and speculations, but if you're that confident, I hope you have some sources?
(for the record, I don't really buy the rest of the argument either on several levels, but this part stood out to me the most)
I'm never a big fan of this sort of... cognitive rewiring? Juggling definitions? This post reinforces my bias, since it's written from a point of very stong bias itself.
AI optimists think AI will go well and be helpful.
AI pessimists think AI will go poorly and be harmful.
It's not that deep.
The post itself is bordering on insulting anyone who has a different opinion than the author (who, no doubt, would prefer the label "AI strategist" than "AI extremists"). I was thinking about going into the details of why, but honestly... this is unlikely to be productive discourse coming from a place where the "other side" is immediately compared to nationalists (?!) or extremists (?!!!).
I'm an AI optimist. I think AI will go well and will help humanity flourish, through both capabilities and alignment research. I think things will work out. That's all.
In what sense do you think it will (might) not go well? My guess is that it will not go at all -- some people will show up in the various locations, maybe some local news outlets will pick it up, and within a week it will be forgotten
Jesus christ, chill. I don't like playing into the meme of "that's why people don't like vegans", but that's exactly why.
And posting something insane followed by an edit of "idk if I endorse comments like this" has got to be the most online rationalist thing ever.
There's a pretty significant difference here in my view -- "carnists" are not a coherent group, not an ideology, they do not have an agenda (unless we're talking about some very specific industry lobbyists who no doubt exist). They're just people who don't care and eat meat.
Ideological vegans (i.e. not people who just happen to not eat meat, but don't really care either way) are a very specific ideological group, and especially if we qualify them like in this post ("EA vegan advocates"), we can talk about their collective traits.
Is this surprising though? When I read the title I was thinking "Yea, that seems pretty obvious"
Often academics justify this on the grounds that you're receiving more than just monetary benefits: you're receiving mentorship and training. We think the same will be true for these positions.
I don't buy this. I'm actually going through the process of getting a PhD at ~40k USD per year, and one of the main reasons why I'm sticking with it is that after that, I have a solid credential that's recognized worldwide, backed by a recognizable name (i.e. my university and my supervisor). You can't provide either of those things.
This offer seems to take the worst of both worlds between academia and industry, but if you actually find someone good at this rate, good for you I suppose
My point is that your comment was extremely shallow, with a bunch of irrelevant information, and in general plagued with the annoying ultra-polite ChatGPT style - in total, not contributing anything to the conversation. You're now defensive about it and skirting around answering the question in the other comment chain ("my endorsed review"), so you clearly intuitively see that this wasn't a good contribution. Try to look inwards and understand why.
It's really good to see this said out loud. I don't necessarily have a broad overview of the funding field, just my experiences of trying to get into it - both into established orgs, or trying to get funding for individual research, or for alignment-adjacent stuff - and ending up in a capabilities research company.
I wonder if this is simply the result of the generally bad SWE/CS market right now. People who would otherwise be in big tech/other AI stuff, will be more inclined to do something with alignment. Similarly, if there's less money in overall tech (maybe outside of LLM-based scams), there may be less money for alignment.
Is it a thing now to post LLM-generated comments on LW?
If Orthogonal wants to ever be taken seriously, by far the most important thing is improving the public-facing communication. I invested a more-than-fair amount of time (given the strong prior for "it won't work" with no author credentials, proof-of-concepts, or anything that would quickly nudge that prior) trying to understand QACI, and why it's not just gibberish (both through reading LW posts and interacting with authors/contributors on the discord server), and I'm still mostly convinced there is absolutely nothing of value in this direction.
And now there's this 10k-word-long post, roughly the size of an actual research paper, with no early indication that there's any value to be obtained by reading the whole thing. I know, I'm "telling on myself" by commenting without reading this post, but y'all rarely get any significant comments on LW posts about QACI (as this post points out), and this might be the reason.
The way I see it, the whole thing has the impressive balance of being extremely hand-wavy as a whole, written up in an extremely "chill and down with the kids" manner, with bits and pieces of math sprinkled in various places, often done incorrectly.
Maybe the general academic formalism isn't the worst thing after all - you need an elevator pitch, an abstract, something to read in a minute or two that will give the general idea of what's going on. Then an introduction, expanding on those ideas and providing some more context. And then the rest of the damn research (which I know is in a very early stage and preparadigmatic and all that - but that's not an excuse for bad communication)
When you say "X is not a paradox", how do you define a paradox?
Does the original paper even refer to x-risk? The word "alignment" doesn't necessarily imply that specific aspect.
I feel like this is one of the cases where you need to be very precise about your language, and be careful not to use an "analogous" problem which actually changes the situation.
Consider the first "bajillion dollars vs dying" variant. We know that right now, there's about 8B humans alive. What happens if the exponential increase exceed that number? We probably have to assume there's an infinite number of humans, fair enough.
What does it mean that "you've chosen to play"? This implies some intentionality, but due to the structure of the game, where the number of players is random, it's not really just up to you.
NOTE: I just realized that the original wording is "you're chosen to play" rather than "you've chosen to play". Damn you, English. I will keep the three variants below, but this means that the right interpretation clearly points towards option B), but the analysis of various interpretations can explain why we even see this as a paradox.
A) One interpretation is "what is the probability that I died given that I played the game?", to which the answer is 0%, because if I died, I wouldn't be around to ask this question.
B) Second interpretation is "Organizer told you there's a slot for you tomorrow in the next (or first) batch. What is the probability that you will die given that you are going to play the game?". Here the answer is pretty trivially 1/36. You don't need anthropics, counterfactual worlds, blue skies. You will roll a dice, and your survival will entirely depend on the outcome of that roll.
C) The potentially interesting interpretation, that I heard somewhere (possibly here) is: "You heard that your friend participated in this game. Given this information, what is the probability that your friend died during the game?". The probability here will be about 50% -- we know that if N people in total participated, about N/2 people will have died.
Consider now the second variant with snakes and colors. Before the god starts his wicked game, do snakes exist? Or is he creating the snakes as he goes? The first sentence "I am a god, creating snakes." seems to imply that this is the process of how all snakes are created. This is important, because it messes with some interpretations. Another complication is that now, "losing" the roll no longer deletes you from existence, which similarly changes interpretations. Let's look at the three variants again.
A) "What is the probability you have red eyes given that you were created in this process?" -- here the answer will be ~50%, following the same global population argument as in variant C of the first variant. This is the interpretation you seem to be going with in your analysis, which is notably different than the interpretation that seems to be valid in the first variant.
B) If snakes are being created as you go with the batches, this no longer has a meaning. The snake can't reflect on what will happen to him if he's chosen to be created, because he doesn't exist.
C) "Some time after this process, you befriended a snake who's always wearing shades. You find out how he was created. Given this, what is the probability that he has red eyes?" -- the answer, following again the same global population argument, is ~50%
In summary, we need to be careful switching to a "less violent" equivalent, because it can often entirely change the problem.
Counterpoint: this is needlessly pedantic and a losing fight.
My understanding of the core argument is that "agent" in alignment/safety literature has a slightly different meaning than "agent" in RL. It might be the case that the difference turns out to be important, but there's still some connection between the two meanings.
I'm not going to argue that RL inherently creates "agentic" systems in the alignment sense. I suspect there's at least a strong correlation there (i.e. an RL-trained agent will typically create an agentic system), but that's honestly beside the point.
The term "RL agent" is very well entrenched and de facto a correct technical term for that part of the RL formalism. Just because alignment people use that term differently, doesn't justify going into neighboring fields and demanding them to change their ways.
It's kinda like telling biologists that they shouldn't use the word [matrix](https://en.wikipedia.org/wiki/Matrix_(biology)) because actual matrices are arrays of numbers (or linear maps whatever, mathematicians don't @ me)
And finally, as an example why even if I drank the kool-aid, I absolutely couldn't do the switch you're recommending -- what about multiagent RL? Especially one with homogeneous agents. Doing s/agent/policy/g won't work, because a multiagent algorithm doesn't have to be multipolicy.
The appendix on s/reward/reinforcement/g is even more silly in my opinion. RL agents (heh) are designed to seek out the reward. They might fail, but that's the overarching goal.
I would be interested in some advice going a step further -- assuming a roughly sufficient technical skill level (in my case, soon-to-be PhD in an application of ML), as well as an interest in the field, how to actually enter the field with a full-time position? I know independent research is one option, but it has its pros and cons. And companies which are interested in alignment are either very tiny (=not many positions), or very huge (like OpenAI et al., =very selective)
Isn't this extremely easy to directly verify empirically?
Take a neural network $f$ trained on some standard task, like ImageNet or something. Evaluate $|f(kx) - kf(x)|$ on a bunch of samples $x$ from the dataset, and $f(x+y) - f(x) - f(y)$ on samples $x, y$. If it's "almost linear", then the difference should be very small on average. I'm not sure right now how to define "very small", but you could compare it e.g. to the distance distribution $|f(x) - f(y)|$ of independent samples, also depending on what the head is.
FWIW my opinion is that all this "circumstantial evidence" is a big non sequitur, and the base statement is fundamentally wrong. But it seems like such an easily testable hypothesis that it's more effort to discuss it than actually verify it.
"Overall, it continually gets more expensive to do the same amount of work"
This doesn't seem supported by the graph? I might be misunderstanding something, but it seems like research funding essentially followed inflation, so it didn't get more expensive in any meaningful terms. The trend even seems to be a little bit downwards for the real value.
Looking for research idea feedback:
Learning to manipulate: consider a system with a large population of agents working on a certain goal, either learned or rule-based, but at this point - fixed. This could be an environment of ants using pheromones to collect food and bring it home.
Now add another agent (or some number of them) which learns in this environment, and tries to get other agents to instead fulfil a different goal. It could be ants redirecting others to a different "home", hijacking their work.
Does this sound interesting? If it works, would it potentially be publishable as a research paper? (or at least a post on LW) Any other feedback is welcome!
But isn't the whole point that the hotel is full initially, and yet can accept more guests?
Has anyone tried to work with neural networks predicting the weights of other neural networks? I'm thinking about that in the context of something like subsystem alignment, e.g. in an RL setting where an agent first learns about the environment, and then creates the subagent (by outputting the weights or some embedding of its policy) who actually obtains some reward
This reminds me of an idea bouncing around my mind recently, admittedly not aiming to solve this problem, but possibly exhibiting it.
Drawing inspiration from human evolution, then given a sufficiently rich environment where agents have some necessities for surviving (like gathering food), they could be pretrained with something like a survival prior which doesn't require any specific reward signals.
Then, agents produced this way could be fine-tuned for downstream tasks, or in a way obeying orders. The problem would arise when an agent is given an order that results in its death. We might want to ensure it follows its original (survival) instinct, unless overridden by a more specific order.
And going back to a multiagent scenario, similar issues might arise when the order would require antisocial behavior in a usually cooperative environment. The AI Economist comes to mind where that could come into play, where agents actually learn some nontrivial social relations https://blog.einstein.ai/the-ai-economist/